AI & ML Nature Is Weird

LLMs don't value things on an absolute scale; they build their internal 'value systems' through relative comparisons, just like humans.

April 14, 2026

Original Paper

Relational Preference Encoding in Looped Transformer Internal States

Jan Kirin

arXiv · 2604.09870

The Takeaway

Looped transformers encode human preferences as pairwise relations rather than absolute points. Linear probes on these internal differences vastly outperform independent evaluators, suggesting the model's judgment is fundamentally relational and comparative.

From the abstract

We investigate how looped transformers encode human preference in their internal iteration states. Using Ouro-2.6B-Thinking, a 2.6B-parameter looped transformer with iterative refinement, we extract hidden states from each loop iteration and train lightweight evaluator heads (~5M parameters) to predict human preference on the Anthropic HH-RLHF dataset. Our pairwise evaluator achieves 95.2% test accuracy on 8,552 unseen examples, surpassing a full-batch L-BFGS probe (84.5%) while the base model r

Read the original paper →

← Back to today's papers