AI & ML Paradigm Shift

Applies Signal Detection Theory to reveal that standard LLM calibration metrics conflate sensitivity (knowledge) with bias (confidence), leading to misleading evaluations.

March 17, 2026

Original Paper

LLMs as Signal Detectors: Sensitivity, Bias, and the Temperature-Criterion Analogy

Jon-Paul Cacioli

arXiv · 2603.14893

The Takeaway

It demonstrates that temperature changes don't just shift confidence (the criterion) but actually change the model's sensitivity (AUC). This methodology provides a much more precise way to diagnose why LLMs hallucinate or act overconfident.

From the abstract

Large language models (LLMs) are evaluated for calibration using metrics such as Expected Calibration Error that conflate two distinct components: the model's ability to discriminate correct from incorrect answers (sensitivity) and its tendency toward confident or cautious responding (bias). Signal Detection Theory (SDT) decomposes these components. While SDT-derived metrics such as AUROC are increasingly used, the full parametric framework - unequal-variance model fitting, criterion estimation,