ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.
March 24, 2026
Original Paper
ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework
arXiv · 2603.20644
The Takeaway
It provides 12 million high-quality pairs across 23 task families. This enables practitioners to train custom, high-performance image editing models without the cost or licensing restrictions of closed APIs like GPT-4V or DALL-E.
From the abstract
Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous image editing datasets either rely on closed-source models for annotation, which prevents cost-effective scaling, or employ fixed synthetic editing pipelines, which suffer from limited quality and generalizability. To address these challenges, we propose ScaleEdi