Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.
March 27, 2026
Original Paper
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation
arXiv · 2603.25702
The Takeaway
It enables practical acceleration of block-diffusion language models without the cost of extra training or specialized drafter models. By using the same model as both a parallel proposer and an autoregressive verifier, it solves the brittleness of standard confidence-thresholded decoding.
From the abstract
Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or inc