Exposes a massive robustness gap in Vision-Language-Action (VLA) models, where simple paraphrasing causes up to 50% success drops.
March 31, 2026
Original Paper
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models
arXiv · 2603.28301
The Takeaway
The paper reveals that modern robotics models often overfit to specific instruction surface forms rather than semantic grounding. It provides the LIBERO-Para benchmark to measure this 'linguistic generalization,' which is critical for deploying robots in human environments.
From the abstract
Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leaving robustness to paraphrased instructions underexplored. To study this gap, we introduce LIBERO-Para, a controlled benchmark that independently varies action expressions and object references for fine-