章节 01
Marmot: A Practical Tool for Precise VRAM Estimation in LLM Deployment
Marmot is an open-source command-line tool written in Rust that addresses the VRAM planning dilemma in LLM deployment. It can quickly calculate GPU memory required for deploying LLMs from Hugging Face/ModelScope configs, supporting Dense, MoE, multimodal, and quantized models. It solves common questions like显存需求 for different precisions, KV Cache impact, and MoE model differences, filling the gap in precise pre-deployment resource planning.