Section 01
[Main Floor/Introduction] Service-Induced Congestion: The Hidden Performance Killer of Memory-Constrained LLM Inference
Service-Induced Congestion: The Hidden Performance Killer of Memory-Constrained LLM Inference (Introduction)
The study reveals the phenomenon of "service-induced congestion" in LLM inference: continuous growth of KV cache leads to memory pressure, and system request eviction causes up to 50% throughput loss. Through a discrete-time dynamic model, the problem is systematically revealed for the first time, and a stability criterion for heterogeneous workloads and scheduling design principles are proposed.
Original Authors and Source:
- Author Team: Paper author team (arXiv:2606.15555v1)
- Source: arXiv
- Original Title: Service-Induced Congestion in Memory-Constrained LLM Serving
- Link: http://arxiv.org/abs/2606.15555v1
- Publication Time: June 14, 2026