AI & ML Efficiency Breakthrough

Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.

March 27, 2026

Original Paper

S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation

Ligong Han, Hao Wang, Han Gao, Kai Xu, Akash Srivastava

arXiv · 2603.25702

The Takeaway

It enables practical acceleration of block-diffusion language models without the cost of extra training or specialized drafter models. By using the same model as both a parallel proposer and an autoregressive verifier, it solves the brittleness of standard confidence-thresholded decoding.

From the abstract

Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or inc

Read the original paper →

← Back to today's papers