AI & ML Breaks Assumption

MoCA3D predicts 3D bounding boxes from monocular images without requiring any camera intrinsics at inference time.

March 23, 2026

Original Paper

MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

Changwoo Jeon, Rishi Upadhyay, Achuta Kadambi

arXiv · 2603.19538

The Takeaway

Almost all monocular 3D detection methods rely on known camera calibration (focal length, etc.). This model breaks that dependency by formulating 3D understanding as image-plane dense prediction, enabling 3D object detection for 'in the wild' images where camera metadata is missing.

From the abstract

Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulate