AI & ML Paradigm Shift

Identifies a fundamental quality-exploration dilemma in Diffusion Language Models where remasking improves single-sample quality but kills reasoning diversity.

April 2, 2026

Original Paper

Locally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language Models

Liancheng Fang, Aiwei Liu, Henry Peng Zou, Yankai Chen, Enze Ma, Leyi Pan, Chunyu Miao, Wei-Chieh Huang, Xue Liu, Philip S. Yu

arXiv · 2604.00375

The Takeaway

It characterizes why dLLMs struggle to match multi-sample gains (Pass@k) and proposes an Independent Metropolis-Hastings sampler to recover reasoning paths, outperforming standard autoregressive and random remasking approaches.

From the abstract

Diffusion large language models (dLLMs) theoretically permit token decoding in arbitrary order, a flexibility that could enable richer exploration of reasoning paths than autoregressive (AR) LLMs. In practice, however, random-order decoding often hurts generation quality. To mitigate this, low-confidence remasking improves single-sample quality (e.g., Pass@$1$) by prioritizing confident tokens, but it also suppresses exploration and limits multi-sample gains (e.g., Pass@$k$), creating a fundamen