Routing-Free MoE replaces centralized routing with individual expert-level activation, eliminating the need for Softmax and Top-K load balancing.
April 2, 2026
Original Paper
Routing-Free Mixture-of-Experts
arXiv · 2604.00801
The Takeaway
Current MoE models suffer from rigid routing biases and scaling bottlenecks. This decentralized approach allows each expert to determine its own activation, leading to more robust scaling and better performance on heterogeneous data distributions.
From the abstract
Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive