This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.
March 24, 2026
Original Paper
Monocular Models are Strong Learners for Multi-View Human Mesh Recovery
arXiv · 2603.20391
The Takeaway
It challenges the necessity of complex multi-view supervision and rigorous hardware calibration. By using single-view models as strong priors and refining via anatomical consistency, it enables high-accuracy 3D recovery in 'in-the-wild' scenarios where calibration is impossible.
From the abstract
Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scena