Mathematical proof that LayerNorm structurally reduces model complexity compared to RMSNorm due to its mean-centering geometry.
March 31, 2026
Original Paper
The Geometric Cost of Normalization: Affine Bounds on the Bayesian Complexity of Neural Networks
arXiv · 2603.27432
The Takeaway
Using Singular Learning Theory, the paper quantifies the 'Bayesian complexity' cost of normalization, proving LayerNorm reduces the local learning coefficient by exactly m/2 while RMSNorm preserves it. This provides a theoretical basis for choosing normalization layers based on desired model capacity and data manifold curvature.
From the abstract
LayerNorm and RMSNorm impose fundamentally different geometric constraints on their outputs - and this difference has a precise, quantifiable consequence for model complexity. We prove that LayerNorm's mean-centering step, by confining data to a linear hyperplane (through the origin), reduces the Local Learning Coefficient (LLC) of the subsequent weight matrix by exactly $m/2$ (where $m$ is its output dimension); RMSNorm's projection onto a sphere preserves the LLC entirely. This reduction is st