AI & ML Scaling Insight

A quantitative model that predicts the performance gain of merging independent LLM specialists before committing compute.

March 25, 2026

Original Paper

KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training

Ramchand Kumaresan

arXiv · 2603.22755

The Takeaway

Practitioners can use a simple linear formula based on model divergence to estimate if fusing domain-specific models will yield a significant improvement. This is highly practical for decentralized training or organizations looking to combine multiple fine-tuned versions of the same base model.

From the abstract

Independently trained domain specialists can be fused post-hoc into a single model that outperforms any individual specialist, and the gain is predictable: gain = 0.82 x divergence - 2.72 (R^2 = 0.856, n=6, 3-26% divergence). This enables practitioners to estimate cooperative value before committing compute. Below ~3.3% divergence, gains approachthis http URLthe KALAVAI protocol, contributors fine-tune copies of a shared checkpoint independently, then submit for lightweight MoE routing (500 step