LLMs aren't 'visualizing' the mazes they solve; they are just following tokenized directions that fall apart if the layout format changes.
April 14, 2026
Original Paper
Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks
arXiv · 2604.10690
The Takeaway
AI maze-solving ability collapses when switching from adjacency lists to visual grids. This proves LLMs lack a robust internal spatial model and instead rely on sophisticated pattern matching based on specific data formatting.
From the abstract
Foundation models have shown remarkable performance across diverse tasks, yet their ability to construct internal spatial world models for reasoning and planning remains unclear. We systematically evaluate the spatial understanding of large language models through maze tasks, a controlled testing context requiring multi-step planning and spatial abstraction. Across comprehensive experiments with Gemini-2.5-Flash, GPT-5-mini, Claude-Haiku-4.5, and DeepSeek-Chat, we uncover significant discrepanci