Section 01
[Main Floor] TIE Scheduler: Core Guide to Uncertainty-Aware Optimization of LLM Inference Scheduling
The TIE scheduler addresses the problem in LLM inference scheduling where traditional point estimation ignores the randomness of output length. Through analysis, it was found that output length follows a heavy-tailed distribution (fittable with a log-t distribution), and the Tail Inflated Expectation (TIE) metric was proposed to adjust the risk of long outputs. Experimental results show a 2.31x reduction in per-token latency for online inference and a 1.42x increase in throughput for offline batch processing.