AI & ML New Capability

Researchers have used LLMs to evolve entirely new Reinforcement Learning update rules from scratch that compete with human-designed baselines like PPO and SAC.

March 31, 2026

Original Paper

Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

Alkis Sygkounas, Amy Loutfi, Andreas Persson

arXiv · 2603.28416

The Takeaway

Instead of just optimizing reward functions, this framework evolves executable Python code for the fundamental learning math. It successfully discovers competitive non-standard algorithms without using canonical mechanisms like TD-loss or value bootstrapping, marking a shift toward automated algorithm discovery.

From the abstract

Reinforcement learning algorithms are defined by their learning update rules, which are typically hand-designed and fixed. We present an evolutionary framework for discovering reinforcement learning algorithms by searching directly over executable update rules that implement complete training procedures. The approach builds on REvolve, an evolutionary system that uses large language models as generative variation operators, and extends it from reward-function discovery to algorithm discovery. To