This paper introduces a Markov-based discrete reasoning model that learns its own stopping criterion and can re-mask and correct its own mistakes.
March 18, 2026
Original Paper
Self-Aware Markov Models for Discrete Reasoning
arXiv · 2603.16661
The Takeaway
Unlike standard diffusion or transformer models that use a fixed number of steps, this architecture adaptively scales its computation to the difficulty of the problem. It achieves 95% validity on extreme Sudoku benchmarks, proving that error correction and variable computation length are superior to fixed-path reasoning.
From the abstract
Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its pr