Pairwise distance structures inside a model knowledge space act as a 'canary in a coal mine' for internal drift.
April 23, 2026
Original Paper
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
arXiv · 2604.17698
The Takeaway
Detecting when an AI is starting to become unstable is currently very difficult. This research shows that the shape of the model internal representations reveals its health. If the geometric stability shifts, the model is likely losing its ability to be steered. We can use this internal geometry to predict failures before they happen in the output. It provides a real-time monitor for the reliability of an AI system. Monitoring the math of the latent space is more effective than watching the words the model says.
From the abstract
Reliable deployment of language models requires two capabilities that appear distinct but share a common geometric foundation: predicting whether a model will accept targeted behavioral control, and detecting when its internal structure degrades. We show that geometric stability, the consistency of a representation's pairwise distance structure, addresses both. Supervised Shesha variants that measure task-aligned geometric stability predict linear steerability with near-perfect accuracy ($\rho =