Localizes reinforcement learning updates for code generation by using execution traces to identify the exact point of semantic failure.
arXiv · March 18, 2026 · 2603.16158
The Takeaway
It solves the coarse credit assignment problem in GRPO-style training where rewards are typically spread uniformly across long programs. By masking tokens downstream of the first execution error, it improves training efficiency and Pass@1 rates on code benchmarks without requiring a critic model or auxiliary losses.
From the abstract
Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single outcome signal is spread uniformly across long programs even when failure stems from a localized semantic error. We propose Execution-Grounded Credit Assignment (EGCA), which localizes GRPO updates using execution traces. For programs that satisfy algorithmic constraints but fail tests, EGCA executes t