AI & ML New Capability

Transitions MLLMs from reactive planning to 'mental navigation' by forcing the construction of hierarchical cognitive maps from egocentric video.

March 24, 2026

Original Paper

Mind over Space: Can Multimodal Large Language Models Mentally Navigate?

Qihui Zhu, Shouwei Ruan, Xiao Yang, Hao Jiang, Yao Huang, Shiji Zhao, Hanwei Fan, Hang Su, Xingxing Wei

arXiv · 2603.21577

The Takeaway

Standard MLLMs fail at spatial reasoning over long horizons; NavMind introduces a paradigm where models internalize spatial representations and simulate paths prior to action. This bridges a critical gap between reactive AI and biological spatial intelligence.

From the abstract

Despite the widespread adoption of MLLMs in embodied agents, their capabilities remain largely confined to reactive planning from immediate observations, consistently failing in spatial reasoning across extensive spatiotemporal scales. Cognitive science reveals that Biological Intelligence (BI) thrives on "mental navigation": the strategic construction of spatial representations from experience and the subsequent mental simulation of paths prior to action. To bridge the gap between AI and BI, we