MoCA3D predicts 3D bounding boxes from monocular images without requiring any camera intrinsics at inference time.
March 23, 2026
Original Paper
MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane
arXiv · 2603.19538
The Takeaway
Almost all monocular 3D detection methods rely on known camera calibration (focal length, etc.). This model breaks that dependency by formulating 3D understanding as image-plane dense prediction, enabling 3D object detection for 'in the wild' images where camera metadata is missing.
From the abstract
Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulate