LEAFE allows LLM agents to internalize feedback as actionable experience, enabling them to backtrack and recover from failures autonomously.
arXiv · March 18, 2026 · 2603.16843
The Takeaway
Unlike standard RL that optimizes only for final success, this framework uses environment feedback to improve the agent's internal reasoning loop. It effectively expands the problem-solving capacity (Pass@k) of agents in long-horizon tasks like coding and interactive planning.
From the abstract
Large language models are increasingly deployed as autonomous agents that must plan, act, and recover from mistakes through long-horizon interaction with environments that provide rich feedback. However, prevailing outcome-driven post-training methods (e.g., RL with verifiable rewards) primarily optimize final success signals, leaving rich environment feedback underutilized. Consequently, they often lead to distribution sharpening: the policy becomes better at reproducing a narrow set of already