A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.
arXiv · March 16, 2026 · 2603.13044
Why it matters
It suggests that the heavy emphasis on designing domain-specific medical CNNs may be unnecessary; practitioners can likely achieve better results and easier deployment by fine-tuning powerful general vision models rather than building niche architectures.
From the abstract
Medical image segmentation (MIS) is a fundamental component of computer-assisted diagnosis and clinical decision support systems. Over the past decade, numerous architectures specifically tailored to medical imaging have emerged to address domain-specific challenges such as low contrast, small anatomical structures, and limited annotated data. In parallel, rapid progress in computer vision has produced highly capable general-purpose vision models (GP-VMs) originally designed for natural images.