Enables 'Elastic Inference' where a single trained model can be converted to multiple lower-precision formats on-the-fly without retraining.
April 2, 2026
Original Paper
MF-QAT: Multi-Format Quantization-Aware Training for Elastic Inference
arXiv · 2604.00529
The Takeaway
Current QAT requires a target format at training time; this framework allows a single checkpoint to support various MXINT/MXFP precisions at runtime. This is a major win for practitioners deploying models across diverse hardware with varying precision support.
From the abstract
Quantization-aware training (QAT) is typically performed for a single target numeric format, while practical deployments often need to choose numerical precision at inference time based on hardware support or runtime constraints. We study multi-format QAT, where a single model is trained to be robust across multiple quantization formats. We find that multi-format QAT can match single-format QAT at each target precision, yielding one model that performs well overall across different formats, even