SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.
March 26, 2026
Original Paper
SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation
arXiv · 2603.24060
The Takeaway
Current VLAs are often brittle 'one-shot' controllers; SOMA introduces long-term memory and failure attribution through RAG and the Model Context Protocol (MCP). It allows robots to learn from their own environment interactions and adapt to out-of-distribution tasks in real-time.
From the abstract
Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention capability. To address this, we propose SOMA, a Strategic Orchestration and Memory-Augmented System that upgrades frozen VLA policies for robust in-context adaptation without parame