Physics Nature Is Weird

An AI can now reconstruct the exact 3D shape of your entire vocal tract just by listening to the sound of your voice.

March 31, 2026

Original Paper

Acoustic-to-articulatory Inversion of the Complete Vocal Tract from RT-MRI with Various Audio Embeddings and Dataset Sizes

Sofiane Azzouz, Pierre-André Vuissoz, Yves Laprie

arXiv · 2603.28723

The Takeaway

By training on MRI scans, a new system learned to 'see' how a person's vocal cords, tongue, and throat move in real-time based only on audio. This demonstrates that the sound of a voice contains a hidden, complete geometric record of the anatomy that produced it.

From the abstract

Articulatory-to-acoustic inversion strongly depends on the type of data used. While most previous studies rely on EMA, which is limited by the number of sensors and restricted to accessible articulators, we propose an approach aiming at a complete inversion of the vocal tract, from the glottis to the lips. To this end, we used approximately 3.5 hours of RT-MRI data from a single speaker. The innovation of our approach lies in the use of articulator contours automatically extracted from MRI image