Section 01
[Introduction] KVDrive: A System-Level Solution to Memory Bottlenecks in Long-Context LLM Inference
Long-context LLM inference faces a bottleneck where KV cache memory requirements grow linearly with sequence length. KVDrive provides a system-level solution for long-context inference by achieving a 1.74x throughput improvement while maintaining accuracy through multi-level cache management across GPU memory, host memory, and SSD, combined with attention-aware cache placement and pipeline scheduling optimizations.