Section 01
[Introduction] LLM-Emu: Core Introduction to the Service-Native LLM Inference Simulator
LLM-Emu is a service-native simulator for vLLM. Its core innovation lies in retaining production-grade HTTP service layers, schedulers, KV cache management, and output processing paths, while replacing GPU forward execution with profiling-sampled latency and synthetic output tokens. Across various GPUs, models, and workloads, TPOT and ITL errors are ≤4.8%, end-to-end latency error ≤5.3%, and output throughput error only 1.9%, providing a low-cost, high-fidelity experimental tool for LLM service system research.