AI & ML Breaks Assumption

Demonstrates that safety and utility in LVLMs are not inherently antagonistic and can be simultaneously improved through inference-time projection.

arXiv · March 17, 2026 · 2603.14825

Yewon Han, Yumin Seol, EunGyung Kong, Minsoo Jo, Taesup Kim

The Takeaway

It challenges the widely held 'alignment tax' assumption by showing that identifying and removing a specific modality-induced bias direction actually improves both jailbreak defense and general reasoning tasks with zero training.

From the abstract

Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grounded reasoning tasks. In this work, we investigate whether safety and utility are inherently antagonistic objectives. We focus on a modality induced bias direction consistently observed across datasets, which arises from suboptimal coupling between the Large Language Model backbone and visual encode

Read the original paper →

← Back to today's papers