Alignment processes induce a 'normative bias' that makes LLMs worse at predicting real human behavior in strategic scenarios.
March 19, 2026
Original Paper
Alignment Makes Language Models Normative, Not Descriptive
arXiv · 2603.17218
The Takeaway
It reveals a 10:1 performance gap where base models outperform aligned models at predicting human choices in games like negotiation. This is a vital warning for researchers using aligned LLMs as proxies for human social or economic behavior.
From the abstract
Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base models outperform their aligned counterparts in predicting human choices by nearly 10:1, robustly across model families, prompt formulation