Zing Forum

Reading

MBE Protocol: Establishing a Standardized Evaluation System for KV Cache Compression in Large Models

Matched-Budget Evaluation (MBE) is a standardized fixed-budget reporting protocol and open-source evaluation framework for KV cache compression methods in large language models, aiming to address the issue of incomparable evaluation results between academia and industry.

KV缓存压缩大语言模型评估协议LLM推理优化开源框架标准化评估
Published 2026-06-12 07:44Recent activity 2026-06-12 07:49Estimated read 5 min
MBE Protocol: Establishing a Standardized Evaluation System for KV Cache Compression in Large Models
1

Section 01

MBE Protocol: Introduction to the Standardized Evaluation System for KV Cache Compression in Large Models

The Matched-Budget Evaluation (MBE) protocol is a standardized fixed-budget reporting protocol and open-source evaluation framework for KV cache compression methods in large language models. It aims to resolve the fragmented issue of incomparable evaluation results in the current KV cache compression field. Its core idea is to compare methods under the same reserved KV memory budget. Through fixed budget tiers and a multi-dimensional evaluation matrix, different research results can be directly compared.

2

Section 02

Background: The Fragmented Dilemma of KV Cache Compression Evaluation

In LLM inference, KV cache is the main source of memory consumption, and its linear growth with sequence length becomes a bottleneck. Although there are various compression methods such as quantization and pruning, different studies use different models, tasks, and metrics, and even lack systematic measurement, leading to results that cannot be directly compared, making it difficult for researchers and engineers to select appropriate methods.

3

Section 03

MBE Core Idea and Standardized Budget Tiers

The core of MBE is to compare methods under the same reserved KV memory budget. It is not a new benchmark but a lightweight reporting layer that is compatible with existing task suites (such as LongBench, GSM8K, etc.). It defines fixed budget tiers: B50 (50%), B25 (25%), B12 (12.5%), B06 (6.25%, optional), which facilitates observing performance curves under different compression intensities.

4

Section 04

MBE's Comprehensive Evaluation Dimension Matrix

MBE requires reporting multi-dimensional metrics at each budget point:

  • Model dimension: Covers 7-8B GQA, 7-14B, and ≥70B models
  • Task dimension: Retrieval, aggregation/tracking, instruction following, reasoning, agent/multi-turn tasks
  • System dimension: Peak memory, throughput, first token time, maximum batch size, hardware level
  • Method dimension: Deployment prerequisites (training-free/calibration/pretraining), composability.
5

Section 05

MBE Open-Source Evaluation Framework Design

MBE provides an adapter-based open-source framework. Researchers only need to implement the KVCompressor interface, and the framework automatically handles budget scanning, task execution, and metric collection. Built-in reference adapters include KIVI (2-bit quantization), H2O (dynamic eviction), SnapKV, StreamingLLM, PyramidKV, etc., which lowers the evaluation threshold.

6

Section 06

MBE Community Contribution and Quick Start

MBE adopts an open contribution model. Researchers can submit evaluation cards (via PR), and CI automatically updates the leaderboard. Quick start steps:

  1. Configure methods and running parameters using YAML
  2. Run run_mbe.py to generate evaluation cards
  3. Render the cards and submit a PR.
7

Section 07

MBE's Significance and Future Outlook

MBE not only solves the fragmented problem of KV cache compression evaluation but also represents a new paradigm for scientific research collaboration. Industry can select methods objectively, and academia can lower the evaluation threshold. As LLM context windows expand, the importance of KV compression increases, and MBE is expected to become the infrastructure in this field, promoting more comparable and reproducible research.