Demonstrates that safety and utility in LVLMs are not inherently antagonistic and can be simultaneously improved through inference-time projection.
arXiv · March 17, 2026 · 2603.14825
The Takeaway
It challenges the widely held 'alignment tax' assumption by showing that identifying and removing a specific modality-induced bias direction actually improves both jailbreak defense and general reasoning tasks with zero training.
From the abstract
Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grounded reasoning tasks. In this work, we investigate whether safety and utility are inherently antagonistic objectives. We focus on a modality induced bias direction consistently observed across datasets, which arises from suboptimal coupling between the Large Language Model backbone and visual encode