AI & ML Efficiency Breakthrough

Enables stable 4-bit microscaling (MXFP4) quantization for Multi-modal LLMs, which previously suffered from performance collapse.

March 18, 2026

Original Paper

BATQuant: Outlier-resilient MXFP4 Quantization via Learnable Block-wise Optimization

Ji-Fu Li, Manyi Zhang, Xiaobo Xia, Han Bao, Haoli Bai, Zhenhua Dong, Xianzhi Yu

arXiv · 2603.16590

The Takeaway

MXFP4 is the emerging hardware standard for efficient inference. By using block-wise transformations that prevent outlier energy from 'bleeding' across blocks, this method recovers 96% of full-precision performance on aggressive W4A4 configurations.

From the abstract

Microscaling floating-point (MXFP) formats have emerged as a promising standard for deploying Multi-modal Large Language Models (MLLMs) and Large Language Models (LLMs) on modern accelerator architectures. However, existing Post-Training Quantization (PTQ) methods, particularly rotation-based techniques designed for integer formats, suffer from severe performance collapse when applied to MXFP4. Recent studies attribute this failure to a fundamental format mismatch: global orthogonal rotations in