Section 01
VeriCache: A Validation Framework for Lossless LLM Inference with Compressed KV Cache (Introduction)
VeriCache addresses the cumulative error issue of traditional KV cache compression in long sequence generation. Its core approach uses compressed KV cache to draft tokens and full KV cache for verification, ensuring output consistency with full-precision inference while achieving up to 4x throughput improvement, solving the dilemma between memory cost and quality risk.