Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.
arXiv · March 13, 2026 · 2603.12056
Why it matters
Solves a major hurdle for deployed agents: the ability to learn and refine planning/tool-use strategies in open-ended environments without the risks and costs associated with constant fine-tuning.
From the abstract
Multimodal agents can now tackle complex reasoning tasks with diverse tools, yet they still suffer from inefficient tool use and inflexible orchestration in open-ended settings. A central challenge is enabling such agents to continually improve without parameter updates by learning from past trajectories. We identify two complementary forms of reusable knowledge essential for this goal: experiences, providing concise action-level guidance for tool selection and decision making, and skills, provi