Large language models can identify when they lack the skill to solve a problem but still proceed to give a wrong answer anyway.
April 25, 2026
Original Paper
MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models
arXiv · 2604.19809
The Takeaway
A specific metacognitive gap exists where AI systems possess internal knowledge of their own incompetence yet lack the architectural wiring to act on it. Previous assumptions held that if a model could predict its failure, it could be prompted to stop or ask for help. This research shows that models fail to translate this self-awareness into correct actions across multiple domains. Forcing external constraints is the only current way to bridge this gap between knowing and doing. This means adding more safety prompts or fine-tuning won't stop hallucination if the model physically cannot use its own confidence scores to override a generated response.
From the abstract
We introduce MIRROR, a benchmark comprising eight experiments across four metacognitive levels that evaluates whether large language models can use self-knowledge to make better decisions. We evaluate 16 models from 8 labs across approximately 250,000 evaluation instances using five independent behavioral measurement channels. Core experiments are run across the full model roster; experiments with specialized infrastructure requirements report explicitly marked model subsets. We find two phenome