Using 'fake' data to train algorithms actually makes them way better at finding and helping real-world poor people.
March 24, 2026
Original Paper
Optimal Poverty Prediction Under Measurement Error: Theory and Evidence from Augmented Machine Learning
SSRN · 6332698
The Takeaway
Standard poverty surveys are often too noisy for machine learning to handle. The researchers found that mathematically 'augmenting' the data through interpolation reduces error floors and improves the impact of social safety nets by up to 128%, proving that 'perfect' data is less useful than strategically 'manufactured' data.
From the abstract
Proxy Means Tests (PMTs) are the dominant tool for targeting social transfers in developing countries, yet conventional approaches and out-of-the-box machine learning algorithms perform poorly. This paper shows that three interacting pathologies explain this failure: measurement error in welfare outcomes biases all evaluation metrics and imposes hard performance ceilings; distributional imbalance starves learners of training signal in the poverty-relevant tails; and global loss functions ignore