Challenges the 'Golden Data' requirement for video generation by showing that imbalanced data can outperform high-quality data through timestep-aware training.
March 27, 2026
Original Paper
Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training
arXiv · 2603.25527
The Takeaway
It proves that models can learn high motion and high visual quality from separate, imperfect datasets by decoupling quality factors across diffusion timesteps. This significantly lowers the bar for data curation in video foundation models.
From the abstract
Recent advances in video generation models have achieved impressive results. However, these models heavily rely on the use of high-quality data that combines both high visual quality and high motion quality. In this paper, we identify a key challenge in video data curation: the Motion-Vision Quality Dilemma. We discovered that visual quality and motion intensity inherently exhibit a negative correlation, making it hard to obtain golden data that excels in both aspects. To address this challenge,