Section 01
xLLM: Guide to JD's Open-Source High-Performance Large Model Inference Engine and Domestic AI Chip Optimization Practice
xLLM is a high-performance LLM inference framework open-sourced by JD, deeply optimized for domestic AI accelerators. Through core technologies like service-engine decoupling architecture, full-graph pipeline execution, dynamic shape graph optimization, and global KV Cache management, it delivers enterprise-level high-throughput, low-latency distributed inference services. This framework has been widely deployed in JD's core retail businesses (intelligent customer service, risk control, supply chain optimization, advertising recommendation, etc.) and is a production-proven solution.