AI & ML Paradigm Shift

Proposes dense point trajectories as universal 'visual tokens' for behavior that generalize across different species and non-rigid objects.

April 2, 2026

Original Paper

Forecasting Motion in the Wild

Neerja Thakkar, Shiry Ginosar, Jacob Walker, Jitendra Malik, Joao Carreira, Carl Doersch

arXiv · 2604.01015

The Takeaway

It shifts behavior prediction away from category-specific labels (e.g., 'cat jumping') toward a mid-level motion representation. This allows for category-agnostic motion forecasting that generalizes to rare species and complex morphologies where traditional models fail.

From the abstract

Visual intelligence requires anticipating the future behavior of agents, yet vision systems lack a general representation for motion and behavior. We propose dense point trajectories as visual tokens for behavior, a structured mid-level representation that disentangles motion from appearance and generalizes across diverse non-rigid agents, such as animals in-the-wild. Building on this abstraction, we design a diffusion transformer that models unordered sets of trajectories and explicitly reasons