Section 01
Introduction: GoodServe—A High-Goodput Service System for Agentic LLM Inference on Heterogeneous GPUs
This article introduces the GoodServe system, which aims to solve the scheduling problem of Agentic LLM inference services in heterogeneous GPU clusters. Through three core technologies—prediction-correction routing strategy, accurate output length estimation, and runtime request migration—it achieves a significant improvement in the proportion of requests meeting SLO (Goodput), with an average increase of 27.4% compared to existing methods.