A new framework can reconstruct the layout of a room and the movement of people using only the sensors in a smartwatch or earbud.
April 24, 2026
Original Paper
Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs
arXiv · 2604.21926
The Takeaway
This system does not use cameras or microphones to see, but instead relies on inertial measurement units that track vibrations. By analyzing the subtle movements of a wearable device, the AI can map out the 3D space around the user and track their 4D motion. This capability is highly counterintuitive because it turns a simple movement sensor into a spatial mapping tool. It means that privacy-sensitive areas without cameras can still be monitored just by tracking the motion of a wrist. The technology could revolutionize how we track physical activity or safety in the workplace without intrusive surveillance. It also opens up new possibilities for augmented reality without needing glasses.
From the abstract
Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene d