Introduces the largest-ever multi-modal CAD dataset with 10 million annotations for 1 million models to enable geometric deep learning on BRep data.
arXiv · March 16, 2026 · 2603.12605
Why it matters
At 5 terabytes, this dataset democratizes the ability to train foundation models for industrial reverse engineering and CAD modeling, a field previously limited by the lack of large-scale structured 3D data.
From the abstract
Reverse engineering and rapid prototyping of computer-aided design (CAD) models from 3D scans, sketches, or simple text prompts are vital in industrial product design. However, recent advances in geometric deep learning techniques lack a multi-modal understanding of parametric CAD features stored in their boundary representation (BRep). This study presents the largest compilation of 10 million multi-modal annotations and metadata for 1 million ABC CAD models, namely A2Z, to unlock an unprecedent