AI & ML Breaks Assumption

Exposes a massive robustness gap in Vision-Language-Action (VLA) models, where simple paraphrasing causes up to 50% success drops.

March 31, 2026

Original Paper

LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models

Chanyoung Kim, Minwoo Kim, Minseok Kang, Hyunwoo Kim, Dahuin Jung

arXiv · 2603.28301

The Takeaway

The paper reveals that modern robotics models often overfit to specific instruction surface forms rather than semantic grounding. It provides the LIBERO-Para benchmark to measure this 'linguistic generalization,' which is critical for deploying robots in human environments.

From the abstract

Vision-Language-Action (VLA) models achieve strong performance in robotic manipulation by leveraging pre-trained vision-language backbones. However, in downstream robotic settings, they are typically fine-tuned with limited data, leading to overfitting to specific instruction formulations and leaving robustness to paraphrased instructions underexplored. To study this gap, we introduce LIBERO-Para, a controlled benchmark that independently varies action expressions and object references for fine-

Read the original paper →

← Back to today's papers