AI & ML Efficiency Breakthrough

INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.

March 31, 2026

Original Paper

INSID3: Training-Free In-Context Segmentation with DINOv3

Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, Stefan Roth

arXiv · 2603.28480

The Takeaway

It demonstrates that scaled-up self-supervised features contain enough spatial and semantic structure to outperform complex supervised pipelines. This allows practitioners to perform high-quality segmentation at any granularity (objects, parts, or instances) with 3x fewer parameters than current methods.

From the abstract

In-context segmentation (ICS) aims to segment arbitrary concepts, e.g., objects, parts, or personalized instances, given one annotated visual examples. Existing work relies on (i) fine-tuning vision foundation models (VFMs), which improves in-domain results but harms generalization, or (ii) combines multiple frozen VFMs, which preserves generalization but yields architectural complexity and fixed segmentation granularities. We revisit ICS from a minimalist perspective and ask: Can a single self-