AI & ML Paradigm Shift

PRM-as-a-Judge shifts robotic evaluation from binary success/failure to a dense, potential-based progress metric system.

March 24, 2026

Original Paper

PRM-as-a-Judge: A Dense Evaluation Paradigm for Fine-Grained Robotic Auditing

Yuheng Ji, Yuyang Liu, Huajie Tan, Xuchuan Huang, Fanding Huang, Yijie Xu, Cheng Chi, Yuting Zhao, Huaihai Lyu, Peterson Co, Mingyu Cao, Qiongyu Zhang, Zhe Li, Enshen Zhou, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Xiaolong Zheng

arXiv · 2603.21669

The Takeaway

Using Process Reward Models (PRMs) to audit trajectory videos allows for fine-grained diagnostic auditing of long-horizon tasks. This allows researchers to identify exactly where a policy fails or becomes inefficient rather than just recording a failure.

From the abstract

Current robotic evaluation is still largely dominated by binary success rates, which collapse rich execution processes into a single outcome and obscure critical qualities such as progress, efficiency, and stability. To address this limitation, we propose PRM-as-a-Judge, a dense evaluation paradigm that leverages Process Reward Models (PRMs) to audit policy execution directly from trajectory videos by estimating task progress from observation sequences. Central to this paradigm is the OPD (Outco