Truncated-Reasoning Self-Distillation (TRSD) allows models to maintain accuracy even when their chain-of-thought traces are heavily shortened.
March 17, 2026
Original Paper
Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation
arXiv · 2603.13274
The Takeaway
This addresses the massive compute overhead of 'reasoning' models by decoupling final answer accuracy from trace length. It enables practitioners to deploy reasoning models with much lower token budgets without the typical massive performance trade-offs.
From the abstract
Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study this setting and introduce Truncated-Reasoning Self-Distillation (TRSD), a lightweight post-training procedure that encourages models to produce correct predictions from partial reasoning traces. In TRSD, a frozen teac