Knowledge distillation can be performed by injecting 'experience' into prompts rather than updating model weights.
March 31, 2026
Original Paper
TED: Training-Free Experience Distillation for Multimodal Reasoning
arXiv · 2603.26778
The Takeaway
TED achieves significant performance gains (e.g., +7.5% on MathVision) by refining reasoning patterns into in-context experiences. This allows for 'live' distillation in resource-constrained environments where parameter updates are impossible or too expensive.
From the abstract
Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-