economics Paradigm Challenge

Using 'fake' data to train algorithms actually makes them way better at finding and helping real-world poor people.

March 24, 2026

Original Paper

Optimal Poverty Prediction Under Measurement Error: Theory and Evidence from Augmented Machine Learning

Werner Hernani-Limarino

SSRN · 6332698

The Takeaway

Standard poverty surveys are often too noisy for machine learning to handle. The researchers found that mathematically 'augmenting' the data through interpolation reduces error floors and improves the impact of social safety nets by up to 128%, proving that 'perfect' data is less useful than strategically 'manufactured' data.

From the abstract

Proxy Means Tests (PMTs) are the dominant tool for targeting social transfers in developing countries, yet conventional approaches and out-of-the-box machine learning algorithms perform poorly. This paper shows that three interacting pathologies explain this failure: measurement error in welfare outcomes biases all evaluation metrics and imposes hard performance ceilings; distributional imbalance starves learners of training signal in the poverty-relevant tails; and global loss functions ignore