Section 01
Marmot: A Practical Tool for Precise VRAM Estimation in LLM Deployment
Marmot is an open-source command-line tool written in Rust that addresses the VRAM planning dilemma in LLM deployment. It can quickly calculate GPU memory required for deploying LLMs from Hugging Face/ModelScope configs, supporting Dense, MoE, multimodal, and quantized models. It solves common questions like VRAM requirements for different precisions, KV Cache impact, and MoE model differences, filling the gap in precise pre-deployment resource planning.