Proposes dense point trajectories as universal 'visual tokens' for behavior that generalize across different species and non-rigid objects.
April 2, 2026
Original Paper
Forecasting Motion in the Wild
arXiv · 2604.01015
The Takeaway
It shifts behavior prediction away from category-specific labels (e.g., 'cat jumping') toward a mid-level motion representation. This allows for category-agnostic motion forecasting that generalizes to rare species and complex morphologies where traditional models fail.
From the abstract
Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons