Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.
March 17, 2026
Original Paper
TMPDiff: Temporal Mixed-Precision for Diffusion Models
arXiv · 2603.14062
The Takeaway
Standard quantization uses fixed bitwidths, but diffusion noise levels change across steps; this framework achieves a 2.5x speedup with 90% fidelity on models like FLUX.1. It provides a training-free optimization for developers looking to reduce the latency of state-of-the-art generative models.
From the abstract
Diffusion models are the go-to method for Text-to-Image generation, but their iterative denoising processes has high inference latency. Quantization reduces compute time by using lower bitwidths, but applies a fixed precision across all denoising timesteps, leaving an entire optimization axis unexplored. We propose TMPDiff, a temporal mixed-precision framework for diffusion models that assigns different numeric precision to different denoising timesteps. We hypothesize that quantization errors a