Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.
arXiv · March 13, 2026 · 2603.11114
Why it matters
Challenges the assumption that sparse MoE routing is primarily a load-balancing or efficiency mechanism. It proves that experts specialize at a semantic level, suggesting that routing behavior can be used as a probe for model understanding or as a trigger for task-specific downstream interventions.
From the abstract
Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation patterns across layers for a given prompt, and use them to study whether MoE routing exhibits task-conditioned structure. Using OLMoE-1B-7B-0125-Instruct as an empirical testbed, we sh