Section 01
[Introduction] TriAxialKV: A New KV Cache Quantization Scheme for Agent Reasoning, 4.5x Compression + 30% Throughput Increase
TriAxialKV proposes a tri-axial mixed-precision KV cache quantization method for agent reasoning tasks. It assigns INT2/INT4 precision to different tokens across three dimensions—temporal proximity, modality type, and semantic role—achieving 4.5x KV cache compression and a 30% throughput increase while maintaining reasoning accuracy, effectively addressing the memory bottleneck in agent reasoning.