AI & ML New Capability

A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.

March 25, 2026

Original Paper

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, Wanli Ouyang

arXiv · 2603.23500

The Takeaway

As models move toward 'o1-style' reasoning for visual tasks, training them becomes difficult. UniGRPO integrates Flow Matching with GRPO, providing a stable reward signal (MSE on velocity fields) to scale reasoning-driven image generation without reward hacking.

From the abstract

Unified models capable of interleaved generation have emerged as a promising paradigm, with the community increasingly converging on autoregressive modeling for text and flow matching for image generation. To advance this direction, we propose a unified reinforcement learning framework tailored for interleaved generation. We validate our approach on its fundamental unit: a single round of reasoning-driven image generation, where the model first expands the user prompt through reasoning, followed