A 14 percentage point drop in accuracy occurs when a geometry problem is switched from standard coordinates to vector form.
April 23, 2026
Original Paper
Measuring Representation Robustness in Large Language Models for Geometry
arXiv · 2604.16421
The Takeaway
Large language models fail to maintain consistent reasoning when the representation of a problem changes. Most people assume the model understands the underlying geometric truth. These results prove that the AI is often just matching patterns in a specific text format. If the prompt uses vectors instead of Euclidean points, the logic falls apart. This fragility highlights a massive gap between true mathematical reasoning and surface-level fluency. Reliability in critical fields like engineering requires models that are invariant to how the data is presented.
From the abstract
Large language models (LLMs) are increasingly evaluated on mathematical reasoning, yet their robustness to equivalent problem representations remains poorly understood. In geometry, identical problems can be expressed in Euclidean, coordinate, or vector forms, but existing benchmarks report accuracy on fixed formats, implicitly assuming representation invariance and masking failures caused by representational changes alone. We propose GeoRepEval, a representation-aware evaluation framework that