AI & ML New Capability

Learning to Self-Evolve (LSE) trains LLMs to explicitly improve their own context at test-time via reinforcement learning.

arXiv · March 20, 2026 · 2603.18620

Xiaoyin Chen, Canwen Xu, Yite Wang, Boyi Liu, Zhewei Yao, Yuxiong He

The Takeaway

Instead of relying on zero-shot reasoning, this framework treats self-improvement as a learnable skill, outperforming GPT-4.5 and Claude 3.5 Sonnet on self-evolution tasks. It enables small models (4B) to surpass frontier models in dynamic context refinement.

From the abstract

We introduce Learning to Self-Evolve (LSE), a reinforcement learning framework that trains large language models (LLMs) to improve their own contexts at test time. We situate LSE in the setting of test-time self-evolution, where a model iteratively refines its context from feedback on seen problems to perform better on new ones. Existing approaches rely entirely on the inherent reasoning ability of the model and never explicitly train it for this task. LSE reduces the multi-step evolution proble