AI & ML Efficiency Breakthrough

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

arXiv · March 13, 2026 · 2603.11917

Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno

Why it matters

It brings the capability of the Segment Anything Model (SAM) directly into vision sensors like the Sony IMX500 with sub-12ms latency. This democratizes high-quality segmentation for power-constrained IoT devices and smart glasses.

From the abstract

Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications such as smart glasses and Internet-of-Things devices. We introduce PicoSAM3, a lightweight promptable visual segmentation model optimized for edge and in-sensor execution, including deployment on the Sony IMX500 vision sensor. PicoSAM3 has 1.3 M parameters and combines a dense CNN architecture with region of interest prompt encoding, Efficient Channel Attention, and knowledge distillation from SAM2