Aligns a base model to a target model's behavior by optimizing the 'data mixture' weights instead of using RLHF or DPO.
March 18, 2026
Original Paper
Domain Mixture Design via Log-Likelihood Differences for Aligning Language Models with a Target Model
arXiv · 2603.16622
The Takeaway
Rather than fine-tuning on specific outputs, this method calculates the gradient direction toward a target distribution (like GPT-4) and re-weights the pretraining data to match it. It suggests that 'alignment' can be built into the data recipe itself rather than added as a post-training patch.
From the abstract
Instead of directly distilling a language model, this study addresses the problem of aligning a base model with a target model in distribution by designing the domain mixture of training data for pretraining or continued pretraining as a fixed training recipe. We propose a method for determining domain weights by viewing models as points in log-likelihood space and aligning the training update direction with the direction toward the target model. Experiments with NanoGPT show that the proposed m