AI & ML New Capability

POISE demonstrates the first autonomous, evidence-driven discovery of improved policy optimization algorithms for LLMs.

March 26, 2026

Original Paper

From AI Assistant to AI Scientist: Autonomous Discovery of LLM-RL Algorithms with LLM Agents

Sirui Xia, Yikai Zhang, Aili Chen, Siye Wu, Siyu Yuan, Yanghua Xiao

arXiv · 2603.23951

The Takeaway

It moves LLMs from being assistants to 'scientists' that can independently iterate on training dynamics. The system discovered new variants of GRPO that significantly improved performance on math benchmarks (AIME25), suggesting a shift toward automated algorithm development.

From the abstract

Discovering improved policy optimization algorithms for language models remains a costly manual process requiring repeated mechanism-level modification and validation. Unlike simple combinatorial code search, this problem requires searching over algorithmic mechanisms tightly coupled with training dynamics while reusing empirical evidence across iterations. We propose POISE, a closed-loop framework for automated discovery of policy optimization algorithms for language models. POISE maintains a s