Simple Self-Distillation (SSD) improves LLM code generation (e.g., Qwen3-30B) by 13% Pass@1 without any external verifiers or teacher models.
April 2, 2026
Original Paper
Embarrassingly Simple Self-Distillation Improves Code Generation
arXiv · 2604.01193
The Takeaway
It shows that LLMs can significantly improve their own coding logic just by fine-tuning on their own high-temperature outputs, suggesting that 'data' for post-training can be effectively bootstrapped without expensive RL or human labeling.
From the abstract
Can a large language model (LLM) improve at code generation using only its own raw outputs, without a verifier, a teacher model, or reinforcement learning? We answer in the affirmative with simple self-distillation (SSD): sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning. SSD improves Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder probl