AI & ML Paradigm Shift

Replaces standard autoregressive document OCR with a parallel diffusion-based denoising framework.

March 25, 2026

Original Paper

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Hejun Dong, Junbo Niu, Bin Wang, Weijun Zeng, Wentao Zhang, Conghui He

arXiv · 2603.22458

The Takeaway

Autoregressive OCR is prone to error propagation and high latency in long-form documents. By treating OCR as inverse rendering via diffusion, this method achieves 3.2x faster decoding and superior robustness to formatting shifts in complex layouts.

From the abstract

Optical character recognition (OCR) has evolved from line-level transcription to structured document parsing, requiring models to recover long-form sequences containing layout, tables, and formulas. Despite recent advances in vision-language models, most existing systems rely on autoregressive decoding, which introduces sequential latency and amplifies error propagation in long documents. In this work, we revisit document OCR from an inverse rendering perspective, arguing that left-to-right caus