AI & ML Breaks Assumption

A massive study of 19 LLMs reveals that subtle identity cues in names and dialects systematically bias automated text annotation.

arXiv · March 17, 2026 · 2603.13891

Petter Törnberg

The Takeaway

Critical finding for practitioners using LLMs for data labeling or social science research. It proves that identity-based stereotypes (e.g., professionalism, aggression) are embedded in LLM judgments even when prompts are neutral, potentially poisoning downstream datasets.

From the abstract

Large language models (LLMs) are increasingly used for automated text annotation in tasks ranging from academic research to content moderation and hiring. Across 19 LLMs and two experiments totaling more than 4 million annotation judgments, we show that subtle identity cues embedded in text systematically bias annotation outcomes in ways that mirror racial stereotypes. In a names-based experiment spanning 39 annotation tasks, texts containing names associated with Black individuals are rated as

Read the original paper →

← Back to today's papers