Breaking the 'capability ceiling' in LLM post-training by replacing full-history dependencies with explicit Markov states.
March 23, 2026
Original Paper
Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States
arXiv · 2603.19987
The Takeaway
Identifies a fundamental bottleneck where RL for LLMs acts only as a pattern refiner rather than an open-ended discoverer. By reintroducing Markovian principles, the paper provides a pathway to genuinely new reasoning capabilities and reduced sample complexity.
From the abstract
Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests it faces a persistent "capability ceiling": unlike classical RL systems that discover novel strategies, RL for LLMs often acts as a mere refiner of patterns already latent in pre-trained weights. In this work, we identify a fundamental structural bottleneck: while classical RL relies on compact, informative Markov states, current LLM post-training f