AI & ML Open Release

Releases Feynman, an agentic pipeline and 100k-sample dataset for generating high-quality, knowledge-rich diagrams with grounded captions.

arXiv · March 16, 2026 · 2603.12597

Zixin Wen, Yifu Cai, Kyle Lee, Sam Estep, Josh Sunshine, Aarti Singh, Yuejie Chi, Wode Ni

Why it matters

Visual reasoning data is notoriously hard to scale due to the lack of high-quality image-text alignment in technical domains. This release provides a scalable way to synthesize complex diagrammatic data for training vision-language models.

From the abstract

Visual design is an essential application of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet image and text data, knowledge-rich and well-aligned image-text pairs are rare. In this paper, we present a scalable diagram generation pipeline built with our agent, Feynman. To create diagrams, Feynman first enumerates domain-specific knowledge components (''ideas'') and performs code planning based