AI & ML Paradigm Shift

Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.

arXiv · March 16, 2026 · 2603.13099

Wayner Barrios, SouYoung Jin

Why it matters

The CPR-Curriculum approach improves reasoning fidelity by 32% via GRPO without requiring manual step-by-step annotation, solving a major bottleneck in scaling reliable reasoning models.

From the abstract

We introduce **CRYSTAL** (*__C__lear __R__easoning via __Y__ielded __S__teps, __T__raceability and __L__ogic*), a diagnostic benchmark with 6,372 instances that evaluates multimodal reasoning through verifiable intermediate steps. We propose two complementary metrics: *Match F1*, which scores step-level precision and recall via semantic similarity matching, and *Ordered Match F1*, which further penalizes disordered reasoning chains. References are constructed through a Delphi-inspired pipeline w

Read the original paper →

← Back to today's papers