# TIDE: An Innovative Methodology to Compress LLM Inference Performance into a Single Comparable Score

> TIDE is a new LLM inference performance evaluation method that compresses the full scan results of concurrency, tensor parallelism, input/output lengths, and model variants into a single comparable numerical score while providing context-aware diagnostic information.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-15T20:15:03.000Z
- 最近活动: 2026-05-15T20:17:42.761Z
- 热度: 160.0
- 关键词: LLM推理, 性能评估, TIDE, 吞吐量, 交互性, 并发优化, 大语言模型, 推理基准测试
- 页面链接: https://www.zingnex.cn/en/forum/thread/tide-llm
- Canonical: https://www.zingnex.cn/forum/thread/tide-llm
- Markdown 来源: floors_fallback

---

## [Main Floor] TIDE: An Innovative Methodology for LLM Inference Performance Evaluation—Single Comparable Score & Context-Aware Diagnosis

This article introduces TIDE (Throughput × Interactivity Density Envelope), an innovative method for evaluating LLM inference performance. It addresses the limitations of single-dimensional metrics in traditional evaluations by compressing multi-dimensional scan results (including concurrency, tensor parallelism, input/output lengths, etc.) into a single comparable score, while retaining context-aware diagnostic information to help developers fairly compare performance across different hardware, concurrency levels, and model sizes.

## Background: Existing Challenges in LLM Inference Performance Evaluation

In the field of LLM inference performance evaluation, developers have long faced the challenge of fair comparison: traditional methods only focus on single dimensions (e.g., tokens generated per second) while ignoring key factors like interaction latency and concurrency scalability. This makes it difficult to effectively compare performance across different hardware configurations, concurrency levels, and model sizes.

## Core Concepts: Composition and Calculation of TIDE Score

The core of TIDE is to compress scan results from concurrency × tensor parallelism × input sequence length × output sequence length × model dimensions into a single score. It consists of two phases:
- **TIDE_decode**: Calculated based on per-GPU output throughput (output tokens/sec/GPU) and interactivity (1/TPOT)
- **TIDE_prefill**: Calculated based on per-GPU input throughput (input tokens/sec/GPU) and interactivity (ISL/TTFT)
Both phases use a hierarchical geometric mean approach: first compute by concurrency context, then by cell, and finally aggregate by model.

## Context Awareness: Four-Dimensional Concurrency Context Division

TIDE innovatively divides concurrency into four logarithmically uniform intervals to achieve context awareness:
- R1 [1-4]: Interactive context (real-time dialogue, low-latency applications)
- R2 [5-16]: Lightweight multi-user context (small to medium service deployments)
- R3 [17-64]: Medium batch processing context (high-throughput scenarios)
- R4 [65-256]: Heavy batch processing context (large-scale offline processing)
Each context computes the geometric mean independently, helping to locate specific scenarios where performance changes occur.

## Practical Application: TIDE Score Example for MI355x Hardware

The TIDE toolchain can process InferenceX data. Below is a score example for MI355x hardware:
### Decode Phase
Total Score: 7,327; Context Breakdown: R1=5215, R2=7509, R3=10741, R4=14741
### Prefill Phase
Total Score: 991,228; Context Breakdown: R1=710965, R2=1376954, R3=1760795, R4=1842960
As concurrency increases, scores for both phases rise, but the decode phase grows slowly, while the prefill phase approaches saturation at high concurrency.

## Toolchain and Visualization Report Support

TIDE provides a complete Python toolchain:
1. `fetch_inferencex_dump.sh`: Downloads weekly database dumps from InferenceX
2. `score_inferencex.py`: Computes scores and generates reports
3. `compare_inferencex.py`: Compares performance differences across different time points
Adding the `--pdf` parameter generates a visualization report, including an overview page (total score + context bar chart), model breakdown page (logarithmic bar chart), and heatmap page (model × context diagnostic details).

## Technical Implementation and Extensibility

The core of TIDE scoring (`tide/score.py`) is data source-agnostic, supporting any loader that outputs the `dict[Cell, list[OperatingPoint]]` format, and can be extended to other benchmark platforms. The core algorithm relies on Python 3.9+'s `statistics.geometric_mean`; PDF report generation requires matplotlib, but the scoring core only uses standard libraries.

## Implications for LLM Inference Optimization

The TIDE methodology provides the following guidance for optimization:
1. **Comprehensive Evaluation**: Avoid optimizing for a single metric; ensure good performance across all contexts
2. **Regression Detection**: Capture performance regressions under specific configurations
3. **Contextual Optimization**: Targeted optimization for specific scenarios
4. **Cross-Platform Comparison**: Use a unified standard for fair comparison
These implications help developers optimize LLM inference systems more efficiently.
