Section 01
Introduction: ReasonAlloc—A New Paradigm to Break the KV Cache Memory Bottleneck of Reasoning Models
ReasonAlloc is a training-agnostic hierarchical budget allocation framework proposed to address the KV cache explosion problem in long-chain thinking reasoning of reasoning models. Through offline layer pre-allocation (capturing the "reasoning wave" pattern) and online head reallocation (dynamic resource optimization) strategies, it significantly reduces KV cache pressure, especially in small-budget scenarios. It is compatible with existing compression methods and has negligible inference overhead.