AI & ML Breaks Assumption

Standard alignment metrics like CKA and RSA systematically fail when comparing networks in superposition, often leading to false conclusions about model similarity.

April 2, 2026

Original Paper

Measuring the Representational Alignment of Neural Systems in Superposition

Sunny Liu, Habon Issa, André Longon, Liv Gorton, Meenakshi Khosla, David Klindt

arXiv · 2604.00208

The Takeaway

As the field shifts toward Sparse Autoencoders and mechanistic interpretability, this paper proves that our primary tools for comparing internal representations are fundamentally flawed for compressed features. It dictates that researchers must align underlying sparse features rather than raw neural activations.

From the abstract

Comparing the internal representations of neural networks is a central goal in both neuroscience and machine learning. Standard alignment metrics operate on raw neural activations, implicitly assuming that similar representations produce similar activity patterns. However, neural systems frequently operate in superposition, encoding more features than they have neurons via linear compression. We derive closed-form expressions showing that superposition systematically deflates Representational Si

Read the original paper →

← Back to today's papers