AI & ML Efficiency Breakthrough

Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.

March 31, 2026

Original Paper

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Da Chang, Qiankun Shi, Lvgang Zhang, Yu Li, Ruijie Zhang, Yao Lu, Yongxiang Liu, Ganzhao Yuan

arXiv · 2603.28254

The Takeaway

Building on the high-performance Muon optimizer, this paper adds row/column normalization to rebalance the momentum matrix before orthogonalization. It yields faster convergence and lower perplexity in LLaMA-scale pretraining with negligible memory overhead.

From the abstract

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before