Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.
March 31, 2026
Original Paper
TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization
arXiv · 2603.27467
The Takeaway
The method exploits the uniform distribution of angles in the Walsh-Hadamard domain to compress keys and values with minimal perplexity degradation. This allows for significantly longer context windows on hardware with limited VRAM without the usual accuracy trade-offs of scalar quantization.
From the abstract
We compress KV cache entries by quantizing angles in the Fast Walsh-Hadamard domain, where a random diagonal rotation makes consecutive element pairs approximately uniformly distributed on the unit circle. We extend this angular quantizer with per-layer early-boost, which independently configures K and V codebook sizes at each layer, allocating higher precision to a model-specific subset of critical layers. Across seven models (1B to 7B parameters), per-layer early-boost achieves lossless compre