Section 01
NanoDeploy: Introduction to the High-Performance Large Model Inference Engine for Production Environments
NanoDeploy is an open-source LLM inference engine developed by the DeepLink team, designed to meet the high concurrency demands of production environments. Through innovative architectures and optimization techniques such as Prefill-Decode separation and wide expert parallelism, it achieves high throughput and low latency, supports mainstream models like DeepSeek, Qwen, and Kimi, and provides an efficient solution for large-scale model service deployment.