AI & ML New Capability

Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.

arXiv · March 13, 2026 · 2603.12056

Guanyu Jiang, Zhaochen Su, Xiaoye Qu, Yi R., Fung

Why it matters

Solves a major hurdle for deployed agents: the ability to learn and refine planning/tool-use strategies in open-ended environments without the risks and costs associated with constant fine-tuning.

From the abstract

Multimodal agents can now tackle complex reasoning tasks with diverse tools, yet they still suffer from inefficient tool use and inflexible orchestration in open-ended settings. A central challenge is enabling such agents to continually improve without parameter updates by learning from past trajectories. We identify two complementary forms of reusable knowledge essential for this goal: experiences, providing concise action-level guidance for tool selection and decision making, and skills, provi

Read the original paper →

← Back to today's papers