Section 01
Introduction: xLLMs—An Innovative Engine to Solve Memory Bottlenecks in LLM Inference
xLLMs is an inference engine project on GitHub for next-generation large language models, designed to address memory bottlenecks in LLM inference, improve inference efficiency, and enhance system throughput. Its core innovations lie in the adoption of a multi-level memory management architecture and an LRU-K eviction strategy, providing a new solution for LLM deployment in memory-constrained scenarios.