Introduces AgentHER, a framework that salvages 'failed' agent trajectories by relabeling them as successful demonstrations for alternative goals.
March 24, 2026
Original Paper
AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
arXiv · 2603.21357
The Takeaway
It addresses the massive data waste in LLM agent training by converting 85%+ failed trajectories into high-quality SFT/DPO data. The method achieves 2x data efficiency, matching baseline performance with half the successful demonstrations across various model sizes.
From the abstract
LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories f