Segment Anything Reasoner (StAR) successfully introduces parallel test-time scaling to visual segmentation tasks, eliciting latent reasoning capabilities from base models.
arXiv · March 17, 2026 · 2603.14382
The Takeaway
It shows that 'compute-at-inference' (search/rollout) strategies used in LLMs (like OpenAI's o1) are transferable to computer vision. This allows models to solve complex, multi-step reasoning queries for object localization that standard feed-forward segmentation models fail on.
From the abstract
As AI systems are being integrated more rapidly into diverse and complex real-world environments, the ability to perform holistic reasoning over an implicit query and an image to localize a target is becoming increasingly important. However, recent reasoning segmentation methods fail to sufficiently elicit the visual reasoning capabilities of the base mode. In this work, we present Segment Anything Reasoner (StAR), a comprehensive framework that refines the design space from multiple perspective