Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.
March 25, 2026
Original Paper
MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation
arXiv · 2603.22677
The Takeaway
Existing music metrics like FAD are distributional and cannot score individual clips, while high-quality per-sample metrics were previously closed-source. This allows researchers to evaluate individual generated clips with 0.95 system-level correlation to human judgment using a lightweight, real-time model.
From the abstract
Distributional metrics such as Fréchet Audio Distance cannot score individual music clips and correlate poorly with human judgments, while the only per-sample learned metric achieving high human correlation is closed-source. We introduce MUQ-EVAL, an open-source per-sample quality metric for AIgenerated music built by training lightweight prediction heads on frozen MuQ-310M features using MusicEval, a dataset of generated clips from 31 text-to-music systems with expert quality ratings. Our simpl