Section 01
[Introduction] VaSE: Value-Aware Stochastic KV Cache Eviction Strategy Boosts Reasoning Model Performance
VaSE addresses the KV cache memory bottleneck caused by long-sequence outputs of reasoning models by proposing a value-aware stochastic KV cache eviction strategy. This strategy maintains reasoning coherence by protecting large-value states and increases cache diversity by introducing randomness. Under 4x KV cache compression, the reasoning model's average accuracy across six reasoning tasks surpasses SOTA selection methods, outperforming the strongest eviction method by over 4%, and it can be deployed without training.