Section 01
Main Floor: llm_perf—A Guide to the First-Principles Analysis Framework for LLM Inference Performance
llm_perf is a lightweight, first-principles-based LLM inference performance modeling framework. Its core goal is to predict latency, throughput, and memory usage before building or renting a hardware cluster. It supports comprehensive analysis of the decoding phase, prefill phase, end-to-end metrics, and separate prefill/decoding. By replacing empirical testing with mathematical modeling, it helps reduce trial-and-error costs and accelerate system optimization iterations.