AI & ML Efficiency Breakthrough

Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.

March 17, 2026

Original Paper

TMPDiff: Temporal Mixed-Precision for Diffusion Models

Basile Lewandowski, Simon Kurz, Aditya Shankar, Robert Birke, Jian-Jia Chen, Lydia Y. Chen

arXiv · 2603.14062

The Takeaway

Standard quantization uses fixed bitwidths, but diffusion noise levels change across steps; this framework achieves a 2.5x speedup with 90% fidelity on models like FLUX.1. It provides a training-free optimization for developers looking to reduce the latency of state-of-the-art generative models.

From the abstract

Diffusion models are the go-to method for Text-to-Image generation, but their iterative denoising processes has high inference latency. Quantization reduces compute time by using lower bitwidths, but applies a fixed precision across all denoising timesteps, leaving an entire optimization axis unexplored. We propose TMPDiff, a temporal mixed-precision framework for diffusion models that assigns different numeric precision to different denoising timesteps. We hypothesize that quantization errors a