Section 01
Continuous Batching LLM Inference Performance Modeling: A System Study Combining Theory & Practice
This post introduces the EE384S-Project by Jav331 (source: GitHub, link: https://github.com/Jav331/EE384S-Project, updated 2026-06-16). It's a comprehensive study combining queueing theory, SimPy simulation, analytical models, and real vLLM hardware measurements to analyze key performance metrics of continuous batching in LLM inference—including TTFT (Time to First Token), throughput (goodput), and blocking behavior.
The project bridges theoretical modeling with practical system behavior, offering insights for researchers and LLM inference deployers.