AI & ML Efficiency Breakthrough

ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.

March 31, 2026

Original Paper

ITQ3_S: High-Fidelity 3-bit LLM Inference via Interleaved Ternary Quantization with Rotation-Domain Smoothing

Edward J. Yoon

arXiv · 2603.27914

The Takeaway

It solves the 'outlier problem' in low-bit quantization by spreading outlier energy across weights using the Fast Walsh-Hadamard Transform. This enables 3-bit deployment on consumer hardware (RTX 5090) with throughput exceeding 1.5 TB/s and near-FP16 perplexity.

From the abstract

We present \textbf{ITQ3\_S} (Interleaved Ternary Quantization -- Specialized), a novel 3-bit weight quantization format for large language models (LLMs) that integrates \textbf{TurboQuant (TQ)}, a rotation-domain adaptive quantization strategy based on the Fast Walsh-Hadamard Transform (FWHT). Conventional 3-bit quantization methods suffer from catastrophic precision loss caused by heavy-tailed weight distributions and inter-channel outliers. ITQ3\_S addresses this fundamental limitation by pre-

Read the original paper →

← Back to today's papers