Learning to Self-Evolve (LSE) trains LLMs to explicitly improve their own context at test-time via reinforcement learning.
arXiv · March 20, 2026 · 2603.18620
The Takeaway
Instead of relying on zero-shot reasoning, this framework treats self-improvement as a learnable skill, outperforming GPT-4.5 and Claude 3.5 Sonnet on self-evolution tasks. It enables small models (4B) to surpass frontier models in dynamic context refinement.
From the abstract
We introduce Learning to Self-Evolve (LSE), a reinforcement learning framework that trains large language models (LLMs) to improve their own contexts at test time. We situate LSE in the setting of test-time self-evolution, where a model iteratively refines its context from feedback on seen problems to perform better on new ones. Existing approaches rely entirely on the inherent reasoning ability of the model and never explicitly train it for this task. LSE reduces the multi-step evolution proble