Releases 55 hours of continuous 30fps expert human computer-use videos to address the 'missing ingredient' for desktop automation agents.
March 26, 2026
Original Paper
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
arXiv · 2603.24440
The Takeaway
Standard agent datasets are sparse screenshots; CUA-Suite provides continuous kinematic traces and multi-layer reasoning. This release democratizes the training of general-purpose computer agents by providing the high-frequency temporal data previously held only by major industry players.
From the abstract
Computer-use agents (CUAs) hold great promise for automating complex desktop workflows, yet progress toward general-purpose agents is bottlenecked by the scarcity of continuous, high-quality human demonstration videos. Recent work emphasizes that continuous video, not sparse screenshots, is the critical missing ingredient for scaling these agents. However, the largest existing open dataset, ScaleCUA, contains only 2 million screenshots, equating to less than 20 hours of video. To address this bo