AI & ML New Capability

Localizes reinforcement learning updates for code generation by using execution traces to identify the exact point of semantic failure.

arXiv · March 18, 2026 · 2603.16158

Abhijit Kumar, Natalya Kumar, Shikhar Gupta

The Takeaway

It solves the coarse credit assignment problem in GRPO-style training where rewards are typically spread uniformly across long programs. By masking tokens downstream of the first execution error, it improves training efficiency and Pass@1 rates on code benchmarks without requiring a critic model or auxiliary losses.

From the abstract

Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single outcome signal is spread uniformly across long programs even when failure stems from a localized semantic error. We propose Execution-Grounded Credit Assignment (EGCA), which localizes GRPO updates using execution traces. For programs that satisfy algorithmic constraints but fail tests, EGCA executes t

Read the original paper →

← Back to today's papers