ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.
March 31, 2026
Original Paper
Reducing Oracle Feedback with Vision-Language Embeddings for Preference-Based RL
arXiv · 2603.28053
The Takeaway
It uses vision-language models to handle common preferences and only requests human 'oracle' intervention for high-uncertainty samples. This drastically lowers the barrier to entry for training complex robotic manipulation tasks using human preferences.
From the abstract
Preference-based reinforcement learning can learn effective reward functions from comparisons, but its scalability is constrained by the high cost of oracle feedback. Lightweight vision-language embedding (VLE) models provide a cheaper alternative, but their noisy outputs limit their effectiveness as standalone reward generators. To address this challenge, we propose ROVED, a hybrid framework that combines VLE-based supervision with targeted oracle feedback. Our method uses the VLE to generate s