Section 01
[Introduction] Frontier: A High-Precision Discrete Event Simulator for Modern LLM Inference Services
Frontier is a discrete event simulator tailored for modern LLM inference services, supporting runtime optimizations including PDD/AFD decoupled execution, CUDA Graphs, and speculative decoding. On a 16-GPU H800 test platform, its average throughput error is less than 4%, end-to-end latency error is reduced from 44.9% to 6.4%, and it can scale to thousands of GPUs. This simulator aims to provide "decision-level fidelity" to help system designers optimize cluster configurations and architecture choices.