AI & ML Paradigm Challenge

Get this: only about 10% of the computer code used in those fancy Nature papers actually works if you try to run it yourself.

March 25, 2026

Original Paper

A Study of Scientific Computational Notebook Quality

Shun Kashiwa, Ayla Kurdak, Savitha Ravi, Ridhi Srikanth, Angel Thakur, Sonia Chandra, Jonathan Truong, Michael Coblenz

arXiv · 2603.22726

The Takeaway

Researchers tried to re-run the software from 19 different Nature publications from 2024 and found that only two of them worked correctly due to missing data and messy logic. The study suggests that much of the world's most prestigious research is built on a foundation of 'tangled' code that other scientists cannot verify.

From the abstract

The quality of scientific code is a critical concern for the research community. Poorly written code can result in irreproducible results, incorrect findings, and slower scientific progress. In this study, we evaluate scientific code quality across three dimensions: reproducibility, readability, and reusability. We curated a corpus of 518 code repositories by analyzing Code Availability statements from all 1239 Nature publications in 2024. To assess code quality, we employed multiple methods, in