Section 01
LeanKV: 2-3x LLM Inference Acceleration via Activation Sparsity + KV Cache Quantization
The LeanKV project innovatively combines activation sparsity and KV cache quantization techniques to increase the inference throughput of large language models (LLMs) by 2-3 times without losing model precision, providing a practical solution for efficient LLM deployment. The project is maintained by asmit383, and its source code is hosted on GitHub.