AI & ML Paradigm Shift

Routing-Free MoE replaces centralized routing with individual expert-level activation, eliminating the need for Softmax and Top-K load balancing.

April 2, 2026

Original Paper

Routing-Free Mixture-of-Experts

Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma

arXiv · 2604.00801

The Takeaway

Current MoE models suffer from rigid routing biases and scaling bottlenecks. This decentralized approach allows each expert to determine its own activation, leading to more robust scaling and better performance on heterogeneous data distributions.

From the abstract

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive

Read the original paper →

← Back to today's papers