AI & ML Breaks Assumption

Alignment processes induce a 'normative bias' that makes LLMs worse at predicting real human behavior in strategic scenarios.

March 19, 2026

Original Paper

Alignment Makes Language Models Normative, Not Descriptive

Eilam Shapira, Moshe Tennenholtz, Roi Reichart

arXiv · 2603.17218

The Takeaway

It reveals a 10:1 performance gap where base models outperform aligned models at predicting human choices in games like negotiation. This is a vital warning for researchers using aligned LLMs as proxies for human social or economic behavior.

From the abstract

Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base models outperform their aligned counterparts in predicting human choices by nearly 10:1, robustly across model families, prompt formulation