Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.
arXiv · March 13, 2026 · 2603.12248
Why it matters
Energy-Based Fine-Tuning (EBFT) provides a path to align models on open-ended tasks where simple pass/fail verifiers don't exist. It outperforms SFT by using on-policy rollout features, providing a denser semantic signal than standard cross-entropy training.
From the abstract
Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic feedback without requiring a task-specific verifier or preference model. To optimize this objective efficiently, we propose