Section 01
DriftSched: An Adaptive QoS-Aware Scheduling Framework for Multi-Tenant LLM Inference
DriftSched is an innovative scheduling framework designed to address the token drift problem in large language model (LLM) inference under multi-tenant environments, optimizing inference performance and resource utilization through an adaptive QoS-aware mechanism. This article will introduce it from aspects such as background, architecture, strategy, experiments, and value.