AI & ML Efficiency Breakthrough

Introduces AgentHER, a framework that salvages 'failed' agent trajectories by relabeling them as successful demonstrations for alternative goals.

March 24, 2026

Original Paper

AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling

Liang Ding

arXiv · 2603.21357

The Takeaway

It addresses the massive data waste in LLM agent training by converting 85%+ failed trajectories into high-quality SFT/DPO data. The method achieves 2x data efficiency, matching baseline performance with half the successful demonstrations across various model sizes.

From the abstract

LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories f

Read the original paper →

← Back to today's papers