AI & ML Paradigm Shift

Introduces the Budget-Sensitive Discovery Score (BSDS), a formally verified metric machine-checked in Lean 4 for evaluating AI-guided scientific candidate selection.

arXiv · March 16, 2026 · 2603.12349

Abhinaba Basu, Pavan Chakraborty

Why it matters

AI for science lacks reliable, budget-aware benchmarks that penalize both false positives and abstention. This verified framework prevents 'cherry-picking' evaluation budgets and provides a rigorous standard for assessing whether LLMs actually add value to scientific pipelines.

From the abstract

Scientific discovery increasingly relies on AI systems to select candidates for expensive experimental validation, yet no principled, budget-aware evaluation framework exists for comparing selection strategies -- a gap intensified by large language models (LLMs), which generate plausible scientific proposals without reliable downstream evaluation. We introduce the Budget-Sensitive Discovery Score (BSDS), a formally verified metric -- 20 theorems machine-checked by the Lean 4 proof assistant -- t