AI & ML Efficiency Breakthrough

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

arXiv · March 16, 2026 · 2603.12823

Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

Why it matters

Current agents route every UI action to expensive models like GPT-4o; this framework uses a lightweight routing layer to escalate only difficult tasks. It provides a practical blueprint for deploying high-reliability agents at a fraction of the current token cost.

From the abstract

Computer Use Agents (CUAs) translate natural-language instructions into Graphical User Interface (GUI) actions such as clicks, keystrokes, and scrolls by relying on a Vision-Language Model (VLM) to interpret screenshots and predict grounded tool calls. However, grounding accuracy varies dramatically across VLMs, while current CUA systems typically route every action to a single fixed model regardless of difficulty. We propose \textbf{Adaptive VLM Routing} (AVR), a framework that inserts a lightw