With the widespread application of large language models (LLMs) across various industries, how to efficiently deploy and scale inference infrastructure has become a core challenge for engineering teams. GPU resources are expensive and in short supply; blind hardware procurement often leads to resource waste or performance bottlenecks. Before actual production deployment, developers urgently need a tool that can simulate real inference loads locally to evaluate the effects of different hardware configurations, scheduling strategies, and optimization techniques.
Tokenmill was created to address this pain point. It is a high-performance discrete event simulator developed with Rust, specifically designed to simulate the behavior of LLM inference clusters. Through precise mathematical modeling and rich configuration options, Tokenmill can predict key metrics such as system latency, throughput, memory usage, and energy consumption before actual deployment.