FineRMoE extends MoE granularity to both intermediate and output dimensions, achieving a 136x increase in decoding throughput.
March 17, 2026
Original Paper
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach
arXiv · 2603.13364
The Takeaway
It overcomes the performance plateau of traditional fine-grained MoEs by expanding the 'expert' dimensionality. The proposed upcycling method allows researchers to convert existing models into this high-efficiency architecture without starting training from scratch.
From the abstract
As revealed by the scaling law of fine-grained MoE, model performance ceases to be improved once the granularity of the intermediate dimension exceeds the optimal threshold, limiting further gains from single-dimension fine-grained design. To address this bottleneck, we propose FineRMoE (FineR-Grained MoE), an architecture that extends fine-grained expert design to both intermediate and output dimensions, aiming to enhance expert specialization beyond the single-dimension limit. We further intro