AI & ML Paradigm Challenge

You can't distill an AI’s 'personality' or uncertainty behaviors into small models without breaking the underlying logic.

April 17, 2026

Original Paper

Disposition Distillation at Small Scale: A Three-Arc Negative Result

arXiv · 2604.11867

The Takeaway

There’s a widespread hope that we can take the nuanced reasoning and self-awareness of a GPT-4 and shrink it into a 7B model. This study presents a 'three-arc negative result,' showing that behavioral dispositions—like knowing when you don't know—collapse into mere stylistic mimicry during distillation. If the model is too small, it learns to sound humble or certain without the actual cognitive backbone to support it. This means practitioners must stop trying to 'distill' wisdom and instead look for structural ways to encode these behaviors. It proves that some cognitive properties are inseparable from scale.

From the abstract

We set out to train behavioral dispositions (self-verification, uncertainty acknowledgment, feedback integration) into small language models (0.6B to 2.3B effective parameters) through a four-stage all-MIT distillation pipeline, with follow-on experiments on inference-time attention-head interventions and a frozen-base confidence-gated sidecar. An internal draft reported +33.9-point MCAS and +15.3-point HumanEval gains on a Qwen3-0.6B student; a second-pass sanity check falsified both numbers be

Read the original paper →

← Back to today's papers