Section 01
【Main Floor】Flow Control Scheduling Framework: Providing Provable Stability Guarantees for LLM Inference
This paper proposes a flow control scheduling framework to address memory growth and system instability issues in LLM inference caused by unknown decoding lengths. The core of the framework is to control the rate at which prompts enter the active set, drawing on network flow control ideas; through theoretical derivation, it obtains necessary conditions for a stable system and sufficient conditions for the algorithm, providing provable stability guarantees. Experiments show that this method outperforms common strategies in throughput, latency, and KV cache stability, and is of great value for the reliable and efficient operation of large-scale LLM services.