A novel approach to upcycle multiple dense expert models into a unified Mixture-of-Experts model without any additional training.
April 1, 2026
Original Paper
Training-Free Dynamic Upcycling of Expert Language Models
arXiv · 2603.29765
The Takeaway
Leveraging a closed-form ridge regression solution, it enables the dynamic addition of experts into a single multitask model while eliminating the cost of fine-tuning and the risk of catastrophic forgetting.
From the abstract
Large Language Models (LLMs) have achieved remarkable performance on a wide range of specialized tasks, exhibiting strong problem-solving capabilities. However, training these models is prohibitively expensive, and they often lack domain-specific expertise because they rely on general knowledge datasets. Expertise finetuning can address this issue; however, it often leads to overspecialization, and developing a single multi-domain expert remains difficult due to diverging objectives. Furthermore