LLMs don't value things on an absolute scale; they build their internal 'value systems' through relative comparisons, just like humans.
April 14, 2026
Original Paper
Relational Preference Encoding in Looped Transformer Internal States
arXiv · 2604.09870
The Takeaway
Looped transformers encode human preferences as pairwise relations rather than absolute points. Linear probes on these internal differences vastly outperform independent evaluators, suggesting the model's judgment is fundamentally relational and comparative.
From the abstract
We investigate how looped transformers encode human preference in their internal iteration states. Using Ouro-2.6B-Thinking, a 2.6B-parameter looped transformer with iterative refinement, we extract hidden states from each loop iteration and train lightweight evaluator heads (~5M parameters) to predict human preference on the Anthropic HH-RLHF dataset. Our pairwise evaluator achieves 95.2% test accuracy on 8,552 unseen examples, surpassing a full-batch L-BFGS probe (84.5%) while the base model r