AI & ML New Capability

LEAFE allows LLM agents to internalize feedback as actionable experience, enabling them to backtrack and recover from failures autonomously.

arXiv · March 18, 2026 · 2603.16843

Rui Ge, Yichao Fu, Yuyang Qian, Junda Su, Yiming Zhao, Peng Zhao, Hao Zhang

The Takeaway

Unlike standard RL that optimizes only for final success, this framework uses environment feedback to improve the agent's internal reasoning loop. It effectively expands the problem-solving capacity (Pass@k) of agents in long-horizon tasks like coding and interactive planning.

From the abstract

Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already