Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.
March 24, 2026
Original Paper
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
arXiv · 2603.20527
The Takeaway
It replaces expensive Newton-Schulz iterations with simple row-wise L2 normalization. This reduces the per-iteration complexity of second-order-like preconditioning from cubic to linear relative to weight matrix dimensions, making advanced optimization practical for massive models.
From the abstract
Preconditioned adaptive methods have gained significant attention for training deep neural networks, as they capture rich curvature information of the loss landscape . The central challenge in this field lies in balancing preconditioning effectiveness with computational efficiency of implementing the preconditioner. Among recent advances, \textsc{Muon} stands out by using Newton-Schulz iteration to obtain preconditioned updates without explicitly constructing the preconditioning matrix. Despite