Large Language Models can maintain performance with only 16-64 unique weight values per matrix, as only the relative rank of weights matters.
March 19, 2026
Original Paper
Only relative ranks matter in weight-clustered large language models
arXiv · 2603.17917
The Takeaway
This discovery suggests that precise weight magnitudes are largely irrelevant compared to their ordinal ranking. It provides a training-free path for extreme LLM compression and suggests that current quantization methods may be focusing on the wrong statistical properties.
From the abstract
Large language models (LLMs) contain billions of parameters, yet many exact values are not essential. We show that what matters most is the relative rank of weights-whether one connection is stronger or weaker than another-rather than precise magnitudes. To reduce the number of unique weight values, we apply weight clustering to pretrained models, replacing every weight matrix with K shared values from K-means. For Llama 3.1-8B-Instruct and SmolLM2-135M, reducing each matrix to only 16-64 distin