Achieves state-of-the-art open-vocabulary segmentation using a training-free, purely geometric projection and propagation method.
March 24, 2026
Original Paper
PEARL: Geometry Aligns Semantics for Training-Free Open-Vocabulary Semantic Segmentation
arXiv · 2603.21528
The Takeaway
It eliminates the need for retraining or auxiliary vision backbones by using Procrustes alignment within the self-attention blocks of fixed models. This allows for immediate adaptation to new label sets with minimal latency and zero training cost.
From the abstract
Training-free open-vocabulary semantic segmentation (OVSS) promises rapid adaptation to new label sets without retraining. Yet, many methods rely on heavy post-processing or handle text and vision in isolation, leaving cross-modal geometry underutilized. Others introduce auxiliary vision backbones or multi-model pipelines, which increase complexity and latency while compromising design simplicity. We present PEARL, \textbf{\underline{P}}rocrust\textbf{\underline{e}}s \textbf{\underline{a}}lignme