AI & ML Paradigm Shift

SOLE-R1 uses Vision-Language Model chain-of-thought reasoning as the sole reward signal for zero-shot robotic reinforcement learning.

March 31, 2026

Original Paper

SOLE-R1: Video-Language Reasoning as the Sole Reward for On-Robot Reinforcement Learning

Philip Schroeder, Thomas Weng, Karl Schmeckpeper, Eric Rosen, Stephen Hart, Ondrej Biza

arXiv · 2603.28730

The Takeaway

It eliminates the need for manual reward engineering, success indicators, or demonstrations in robot learning by leveraging spatiotemporal reasoning from VLMs. This allows robots to learn complex, unseen manipulation tasks from scratch using only raw video observations and natural language goals.

From the abstract

Vision-language models (VLMs) have shown impressive capabilities across diverse tasks, motivating efforts to leverage these models to supervise robot learning. However, when used as evaluators in reinforcement learning (RL), today's strongest models often fail under partial observability and distribution shift, enabling policies to exploit perceptual errors rather than solve the task. To address this limitation, we introduce SOLE-R1 (Self-Observing LEarner), a video-language reasoning model expl