Section 01
TurboQuant+ Overview: Production-Grade LLM KV Cache & Weight Quantization
TurboQuant+ is a production-level implementation of Google's TurboQuant paper as an extension to llama.cpp. It uses Walsh-Hadamard rotation and polar codebook quantization to achieve up to 4.6x KV cache compression while maintaining model quality. Key features include cross-platform backend support (Apple Silicon, NVIDIA CUDA, AMD ROCm, Vulkan) and an additive design that preserves existing llama.cpp functionality.