Section 01
LMCache: An Efficient Caching System for Large Language Models (Introduction)
LMCache: An Efficient Caching System for Large Language Models
LMCache is a memory-efficient caching system tailored for large language models (LLMs). It enables cross-session KV Cache reuse via intelligent caching mechanisms, significantly reducing inference costs and response latency, and providing performance breakthroughs for large-scale LLM applications. It addresses the core pain point of traditional KV Caches being unable to reuse across sessions, enhancing response speed and reducing redundant computations.