Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.
arXiv · March 16, 2026 · 2603.12698
Why it matters
Standard coding RL datasets often use static, weak tests that models easily 'game.' By iteratively evolving test cases based on model failure modes, this framework provides a more robust reward signal, leading to significant performance gains in 4B-scale models across coding benchmarks.
From the abstract
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for improving code generation in large language models, but its effectiveness is limited by weak and static verification signals in existing coding RL datasets. In this paper, we propose a solution-conditioned and adversarial verification framework that iteratively refines test cases based on the execution behaviors of candidate solutions, with the goal of increasing difficulty, improving discriminative power, and redu