AI & ML Scaling Insight

hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.

March 25, 2026

Original Paper

Latent Semantic Manifolds in Large Language Models

Mohamed A. Mabrok

arXiv · 2603.22301

The Takeaway

Provides a geometric explanation for why quantization works (or fails) and identifies a persistent 'hard core' of representations at the boundaries. This offers a theoretical basis for improving model compression and decoding strategies.

From the abstract

Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states as points on a latent semantic manifold: a Riemannian submanifold equipped with the Fisher information metric, where tokens correspond to Voronoi regions partitioning the manifold. We define the expressibility gap, a geometric measure