You can't distill an AI’s 'personality' or uncertainty behaviors into small models without breaking the underlying logic.
April 17, 2026
Original Paper
Disposition Distillation at Small Scale: A Three-Arc Negative Result
arXiv · 2604.11867
The Takeaway
There’s a widespread hope that we can take the nuanced reasoning and self-awareness of a GPT-4 and shrink it into a 7B model. This study presents a 'three-arc negative result,' showing that behavioral dispositions—like knowing when you don't know—collapse into mere stylistic mimicry during distillation. If the model is too small, it learns to sound humble or certain without the actual cognitive backbone to support it. This means practitioners must stop trying to 'distill' wisdom and instead look for structural ways to encode these behaviors. It proves that some cognitive properties are inseparable from scale.
From the abstract
We set out to train behavioral dispositions (self-verification, uncertainty acknowledgment, feedback integration) into small language models (0.6B to 2.3B effective parameters) through a four-stage all-MIT distillation pipeline, with follow-on experiments on inference-time attention-head interventions and a frozen-base confidence-gated sidecar. An internal draft reported +33.9-point MCAS and +15.3-point HumanEval gains on a Qwen3-0.6B student; a second-pass sanity check falsified both numbers be