Achieves state-of-the-art LLM distillation using 10-25% of the data required by standard fine-tuning.
March 23, 2026
Original Paper
Probing to Refine: Reinforcement Distillation of LLMs via Explanatory Inversion
arXiv · 2603.19266
The Takeaway
By using Explanatory Inversion and a novel reinforcement learning bonus (EXGRPO), it forces student models to learn underlying logic rather than superficial patterns. This is a massive efficiency win for organizations trying to bake 'Big Model' reasoning into 7B-class models.
From the abstract
Distilling robust reasoning capabilities from large language models (LLMs) into smaller, computationally efficient student models remains an unresolved challenge. Despite recent advances, distilled models frequently suffer from superficial pattern memorization and subpar generalization. To overcome these limitations, we introduce a novel distillation framework that moves beyond simple mimicry to instill a deeper conceptual understanding. Our framework features two key innovations. \underline{\te