AI & ML Paradigm Shift

Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.

arXiv · March 13, 2026 · 2603.12248

Samy Jelassi, Mujin Kwun, Rosie Zhao, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade, Carles Domingo-Enrich

Why it matters

Energy-Based Fine-Tuning (EBFT) provides a path to align models on open-ended tasks where simple pass/fail verifiers don't exist. It outperforms SFT by using on-policy rollout features, providing a denser semantic signal than standard cross-entropy training.

From the abstract

Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose