Large language models default to English-centric spatial logic even when they are speaking Japanese or Swahili.
April 29, 2026
Original Paper
Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives
arXiv · 2604.25423
The Takeaway
AI models appear fluent in many languages, but they lack the cultural and physical context that shapes human speech. This study tested how models use words like "this" and "that" to describe objects in physical space. The results show that AI fails to understand the distance-based distinctions used by different cultures, sticking instead to a generic English pattern. This suggests that despite their massive training data, these models do not actually perceive the world or understand human perspective. They are simply mimicking word frequencies without any internal map of the physical environment.
From the abstract
Do large language models (LLMs) truly acquire embodied cognition and cultural conventions from text? We introduce demonstratives, fundamental spatial expressions like "this/that" in English and "zhè/nà" in Chinese, as a novel probe for grounded knowledge. Using 6,400 responses from 320 native speakers, we establish a human baseline: English speakers reliably distinguish proximal-distal referents but struggle with perspective-taking, while Chinese speakers switch perspectives fluently but tolerat