Section 01
SparseX: Efficient Segment-Level KV Cache Sharing for Interleaved LLM Services (Introduction)
This paper proposes SparseX, an efficient segment-level KV cache sharing method for long-context LLM services. Addressing the problem that traditional Prefix Cache cannot handle non-prefix segment repetition across requests, rounds, and agents, SparseX restores cross-segment context interactions through segment-level cache reuse, sparse Q indexing to estimate key tokens, and sparse recomputation in a single forward pass. It is compatible with vLLM/PagedAttention and suitable for scenarios like multi-turn dialogue and RAG.