Accelerates state-of-the-art 3D human mesh recovery by over 10x, enabling real-time vision-only humanoid teleoperation.
March 17, 2026
Original Paper
Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery
arXiv · 2603.15603
The Takeaway
By decoupling spatial dependencies and replacing iterative fitting with direct feedforward mapping, it transforms a 'seconds-per-frame' model into a real-time system. This enables direct collection of robot manipulation policies from standard RGB video without specialized hardware.
From the abstract
SAM 3D Body (3DB) achieves state-of-the-art accuracy in monocular 3D human mesh recovery, yet its inference latency of several seconds per image precludes real-time application. We present Fast SAM 3D Body, a training-free acceleration framework that reformulates the 3DB inference pathway to achieve interactive rates. By decoupling serial spatial dependencies and applying architecture-aware pruning, we enable parallelized multi-crop feature extraction and streamlined transformer decoding. Moreov