Discovers that video diffusion models commit to high-level plans in the first few denoising steps, enabling a new inference-time scaling technique called ChEaP.
April 1, 2026
Original Paper
Video Models Reason Early: Exploiting Plan Commitment for Maze Solving
arXiv · 2603.30043
The Takeaway
By recognizing that the core trajectory is decided early, the authors show we can 'early-exit' or branch bad seeds before spending compute on full video generation. This allows video models to solve complex, long-horizon tasks (like mazes) that were previously thought impossible for diffusion architectures.
From the abstract
Video diffusion models exhibit emergent reasoning capabilities like solving mazes and puzzles, yet little is understood about how they reason during generation. We take a first step towards understanding this and study the internal planning dynamics of video models using 2D maze solving as a controlled testbed. Our investigations reveal two findings. Our first finding is early plan commitment: video diffusion models commit to a high-level motion plan within the first few denoising steps, after w