AI & ML Paradigm Shift

Breaking the 'capability ceiling' in LLM post-training by replacing full-history dependencies with explicit Markov states.

March 23, 2026

Original Paper

Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States

Yurun Yuan, Tengyang Xie

arXiv · 2603.19987

The Takeaway

Identifies a fundamental bottleneck where RL for LLMs acts only as a pattern refiner rather than an open-ended discoverer. By reintroducing Markovian principles, the paper provides a pathway to genuinely new reasoning capabilities and reduced sample complexity.

From the abstract

Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unlike classical RL systems that discover novel strategies, RL for LLMs often acts as a mere refiner of patterns already latent in pre-trained weights. In this work, we identify a fundamental structural bottleneck: while classical RL relies on compact, informative Markov states, current LLM post-training f