Section 01
Kitty Project Introduction: 2-bit KV Cache Quantization Scheme Solves LLM Inference Memory Bottleneck
The Kitty project proposes an innovative KV cache quantization method that achieves high-precision large language model (LLM) inference acceleration using only 2-bit quantization, leveraging dynamic channel precision enhancement technology. This scheme aims to address the bottleneck of rapidly expanding GPU memory usage by KV cache in LLM inference, and has significant practical deployment value.