AI & ML Open Release

Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.

March 25, 2026

Original Paper

MuQ-Eval: An Open-Source Per-Sample Quality Metric for AI Music Generation Evaluation

Di Zhu, Zixuan Li

arXiv · 2603.22677

The Takeaway

Existing music metrics like FAD are distributional and cannot score individual clips, while high-quality per-sample metrics were previously closed-source. This allows researchers to evaluate individual generated clips with 0.95 system-level correlation to human judgment using a lightweight, real-time model.

From the abstract

Distributional metrics such as Fréchet Audio Distance cannot score individual music clips and correlate poorly with human judgments, while the only per-sample learned metric achieving high human correlation is closed-source. We introduce MUQ-EVAL, an open-source per-sample quality metric for AIgenerated music built by training lightweight prediction heads on frozen MuQ-310M features using MusicEval, a dataset of generated clips from 31 text-to-music systems with expert quality ratings. Our simpl