Section 01
LightKV Introduction: Lightweight KV Caching for Efficient LVLM Deployment
LightKV addresses the memory bottleneck of KV caching in Large Vision-Language Models (LVLMs) using a cross-modal message-passing mechanism to compress the KV cache of visual tokens. By retaining only 55% of the original visual tokens, it halves the KV cache size, reduces computational load by 40%, maintains model performance, and significantly outperforms compression baselines that only consider visual information.