AI & ML New Capability

Boosts open-model agent performance on web navigation tasks from 6.4% to 43%, surpassing proprietary models like GPT-4o.

March 23, 2026

Original Paper

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

arXiv · 2603.19685

The Takeaway

The MiRA framework uses dense, milestone-based reward signals to solve the sparse reward problem in long-horizon LLM tasks. This allows the 12B Gemma3 model to drastically outperform much larger proprietary systems, demonstrating a path for small, open-weights models to dominate complex agentic workflows.

From the abstract

Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing LLM-based agents struggle with long-horizon planning in two main ways. During online execution, they often lose track as new information arrives, lacking a clear and adaptive path t