Section 01
[Introduction] NetKV: Network-Aware Optimization of KV Cache Scheduling for Disaggregated LLM Inference
This paper proposes the NetKV system, which addresses the KV cache transfer scheduling problem in disaggregated LLM inference by introducing a network cost predictor to optimize decoding instance selection. On a 64-GPU simulator, NetKV reduces the Time To First Token (TTFT) by 21.2%, improves the Service Level Objective (SLO) achievement rate by 20.1 percentage points, and requires no modification to existing infrastructure.