Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.
arXiv · March 16, 2026 · 2603.12799
Why it matters
The paper finds that robustness is primarily a shallow-layer phenomenon driven by low-frequency spectral bias. By freezing pre-trained weights and only adapting initial layers (R-Adapt), researchers can equip models with robustness without the typical 10-20% drop in clean accuracy.
From the abstract
Achieving adversarial robustness in Vision-Language Models (VLMs) inevitably compromises accuracy on clean data, presenting a long-standing and challenging trade-off. In this work, we revisit this trade-off by investigating a fundamental question: What makes VLMs robust? Through a detailed analysis of adversarially fine-tuned models, we examine how robustness mechanisms function internally and how they interact with clean accuracy. Our analysis reveals that adversarial robustness is not uniforml