Your MoE model's 'experts' aren't actually specialists in math or coding; they're just specialists in high-dimensional geometry.
April 14, 2026
Original Paper
The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise
arXiv · 2604.09780
The Takeaway
The study reveals that routing in Mixture-of-Experts models reflects the geometry of the hidden state space rather than human-defined domains. This challenges the industry assumption that MoE architectures create interpretable, topic-specific sub-networks.
From the abstract
Mixture of Experts (MoEs) are now ubiquitous in large language models, yet the mechanisms behind their "expert specialization" remain poorly understood. We show that, since MoE routers are linear maps, hidden state similarity is both necessary and sufficient to explain expert usage similarity, and specialization is therefore an emergent property of the representation space, not of the routing architecture itself. We confirm this at both token and sequence level across five pre-trained models. We