AI & ML Paradigm Shift

GSB-PPO lifts proximal policy optimization from discrete action steps to full generation trajectories by framing it as a Generalized Schrödinger Bridge.

March 24, 2026

Original Paper

Proximal Policy Optimization in Path Space: A Schrödinger Bridge Perspective

Yuehu Gong, Zeyuan Wang, Yulin Chen, Yanwei Fu

arXiv · 2603.21621

The Takeaway

This provides a unified mathematical framework for training generative policies (like diffusion or flow-based models) using on-policy RL. It solves the mismatch between PPO’s action-space ratios and the path-space nature of modern generative processes.

From the abstract

On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space formulation of generative PPO inspired by the Generalized Schrödinger Bridge (GSB). Our framework lifts