Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.
March 24, 2026
Original Paper
Swim2Real: VLM-Guided System Identification for Sim-to-Real Transfer
arXiv · 2603.20827
The Takeaway
It replaces manually tailored Sim-to-Real pipelines with an automated VLM feedback loop that interprets visual discrepancies to tune 16 simulation parameters. This enables zero-shot transfer for soft aquatic robots where traditional fluid dynamics modeling usually fails.
From the abstract
We present Swim2Real, a pipeline that calibrates a 16-parameter robotic fish simulator from swimming videos using vision-language model (VLM) feedback, requiring no hand-designed search stages. Calibrating soft aquatic robots is particularly challenging because nonlinear fluid-structure coupling makes the parameter landscape chaotic, simplified fluid models introduce a persistent sim-to-real gap, and controlled aquatic experiments are difficult to reproduce. Prior work on this platform required