AI & ML Efficiency Breakthrough

Bootstraps reasoning-heavy RL by stochastically injecting few-shot demonstrations into training prompts via a curriculum.

March 20, 2026

Original Paper

Context Bootstrapped Reinforcement Learning

Saaket Agashe, Jayanth Srinivasa, Gaowen Liu, Ramana Kompella, Xin Eric Wang

arXiv · 2603.18953

The Takeaway

It solves the 'cold start' problem in Reinforcement Learning from Verifiable Rewards (RLVR) where models fail to find any correct reasoning path. By annealing demonstrations over time, it forces the model to internalize complex reasoning patterns, significantly improving success rates on novel reasoning tasks.

From the abstract

Reinforcement Learning from Verifiable Rewards (RLVR) suffers from exploration inefficiency, where models struggle to generate successful rollouts, resulting in minimal learning signal. This challenge is particularly severe for tasks that require the acquisition of novel reasoning patterns or domain-specific knowledge. To address this, we propose Context Bootstrapped Reinforcement Learning (CBRL), which augments RLVR training by stochastically prepending few-shot demonstrations to training promp

Read the original paper →

← Back to today's papers