AI & ML Efficiency Breakthrough

Provides a robust method for distilling discrete diffusion models that maintains quality and diversity even with very few sampling steps.

March 23, 2026

Original Paper

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

arXiv · 2603.20155

The Takeaway

Distilling discrete models (common in text) has been significantly harder than continuous ones. This technique allows for fast, high-quality generation that can even outperform the original teacher model, making discrete diffusion much more viable for production.

From the abstract

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful.Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and