Provides a massive 2.5M image-to-TikZ dataset and the first instruction-augmented dataset for geometric visual reasoning.
March 25, 2026
Original Paper
GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning
arXiv · 2603.22687
The Takeaway
It democratizes fine-grained geometric perception by releasing a dataset 16x larger than existing open-source alternatives. The framework enables MLLMs to serve as plug-and-play modules for solving complex geometric problems through code generation.
From the abstract
Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities. However, they struggle to perceive fine-grained geometric structures, constraining their ability of geometric understanding and visual reasoning. To address this, we propose GeoTikzBridge, a framework that enhances local geometric perception and visual reasoning through tikz-based code generation. Within this framework, we build two models supported by two complementary datasets. Th