Section 01
[Introduction] TIE: Uncertainty-Aware Output Length Prediction for Optimizing LLM Inference Scheduling
TIE is an open-source project from an ICML 2026 paper. Addressing the problem of GPU idling caused by output length variations in LLM batch inference, it proposes an uncertainty-aware output length prediction method to optimize inference scheduling, effectively reducing GPU idle waiting time and improving inference throughput. The project is implemented based on the vLLM framework, with the open-source address at https://github.com/Hyzheng-code/TIE.