Section 01
[Introduction] Dooly: Core Interpretation of a Configuration-Agnostic Performance Profiling System for LLM Inference Simulation
Dooly is a configuration-agnostic, redundancy-aware performance profiling system for LLM inference simulation. Addressing the high cost of full re-profiling in traditional simulators, it marks the source of input dimensions via taint propagation, enabling profiling for multiple configurations with a single inference pass. While maintaining simulation accuracy of 5% for TTFT and 8% for TPOT, it reduces the profiling GPU time for 12 models by 56.4%, providing an efficient solution for configuration optimization in LLM deployment.