AI & ML New Capability

Enables GUI agents to overcome domain bias by autonomously 'watching' web tutorial videos to learn specific software workflows without retraining.

March 30, 2026

Original Paper

GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation

Rui Xie, Zhi Gao, Chenrui Shi, Zirui Shang, Lu Chen, Qing Li

arXiv · 2603.26266

The Takeaway

This framework (GUIDE) uses a video-RAG pipeline to extract planning and grounding knowledge from existing online content and injects it into the agent at runtime. It solves a major barrier for deploying GUI agents in specialized professional software where training data is scarce.

From the abstract

Large vision-language models have endowed GUI agents with strong general capabilities for interface understanding and interaction. However, due to insufficient exposure to domain-specific software operation data during training, these agents exhibit significant domain bias - they lack familiarity with the specific operation workflows (planning) and UI element layouts (grounding) of particular applications, limiting their real-world task performance. In this paper, we present GUIDE (GUI Unbiasing

Read the original paper →

← Back to today's papers