AI & ML Paradigm Shift

Challenges the 'filter-first' data paradigm by showing that training on uncurated data with quality-score labels outperforms training on high-quality filtered subsets.

March 31, 2026

Original Paper

LACON: Training Text-to-Image Model from Uncurated Data

Zhiyang Liang, Ziyu Wan, Hongyu Liu, Dong Chen, Qiu Shen, Hao Zhu, Dongdong Chen

arXiv · 2603.26866

The Takeaway

Instead of aggressively discarding 'bad' data, the LACON framework teaches the model the explicit boundary between high and low quality. This approach allows generative models to leverage the full distribution of available data, achieving better results with the same compute budget by improving the model's understanding of aesthetic and structural quality markers.

From the abstract

The success of modern text-to-image generation is largely attributed to massive, high-quality datasets. Currently, these datasets are curated through a filter-first paradigm that aggressively discards low-quality raw data based on the assumption that it is detrimental to model performance. Is the discarded bad data truly useless, or does it hold untapped potential? In this work, we critically re-examine this question. We propose LACON (Labeling-and-Conditioning), a novel training framework that