AI & ML New Capability

This physics-informed VLM framework improves physics-grounded anomaly detection AUROC from 66.9% to 96.7%.

arXiv · March 17, 2026 · 2603.15237

Yao Gu, Xiaohao Xu, Yingna Wu

The Takeaway

Current VLMs excel at appearance but fail at physical dynamics (e.g., irregular rotations). By decomposing causal reasoning into multi-turn dialogues with structured physical priors, this paper demonstrates a massive jump in a model's ability to understand mechanical constraints.

From the abstract

Vision-Language Models (VLMs) demonstrate strong general-purpose reasoning but remain limited in physics-grounded anomaly detection, where causal understanding of dynamics is essential. Existing VLMs, trained predominantly on appearance-centric correlations, fail to capture kinematic constraints, leading to poor performance on anomalies such as irregular rotations or violated mechanical motions. We introduce a physics-informed instruction tuning framework that explicitly encodes object propertie

Read the original paper →

← Back to today's papers