Paradigm Challenge / Economics

The best websites are blocking AI from reading them, so future bots are going to be trained on the absolute trash left behind.

AI-generated illustration

The Takeaway

As top-tier publishers block AI crawlers, the pool of training data is increasingly filled with low-quality content and misinformation. This 'adverse selection' means the more we use AI, the more we might be training it on garbage.

By SeriesFusion Editorial Board · April 10, 2026

Original Paper

Adverse Selection in the AI Data Commons

Kai Zhu

SSRN · 6438640

From the abstract

Generative AI depends on high-quality web content, yet no market compensates its producers. We document adverse selection in this AI data commons: facing a binary opt-out choice, the highest-quality producers exit first, degrading the remaining commons. Studying media and news sites at scale, we find a steep quality-blocking gradient: high-factual outlets block at nearly six times the rate of low-factual sources, with misinformation sources remaining most accessible. Publishers strategically tar

Read the original paper →

← Back to today's papers