Section 01
Introduction: RecomputeOrMigrate—A Network-Aware KV Cache Recovery Scheduler for Disaggregated LLM Inference
RecomputeOrMigrate (abbreviated as RoM/KVRS) is a lightweight scheduler for disaggregated LLM inference systems, addressing the KV cache recovery decision problem after a decoding GPU failure. It dynamically chooses between migrating KV cache or recomputing based on real-time network bandwidth and prompt length. Experiments show it can improve effective throughput by 8.6%, providing new insights for the reliability design of disaggregated architectures.