AI & ML Efficiency Breakthrough

Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.

March 24, 2026

Original Paper

When Negation Is a Geometry Problem in Vision-Language Models

Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis

arXiv · 2603.20554

The Takeaway

Negation is a classic failure mode for models like CLIP. This paper shows a 'negation direction' exists in embedding space and can be manipulated at test-time, providing a zero-cost fix for a major multimodal limitation.

From the abstract

Joint Vision-Language Embedding models such as CLIP typically fail at understanding negation in text queries - for example, failing to distinguish "no" in the query: "a plain blue shirt with no logos". Prior work has largely addressed this limitation through data-centric approaches, fine-tuning CLIP on large-scale synthetic negation datasets. However, these efforts are commonly evaluated using retrieval-based metrics that cannot reliably reflect whether negation is actually understood. In this p