AI & ML Nature Is Weird

AI models guess the right answer to hard math theorems 80 percent of the time but fail to prove them almost every time.

April 20, 2026

Original Paper

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

Chengwu Liu, Yichun Yin, Ye Yuan, Jiaxuan Xie, Botao Li, Siqi Li, Jianhao Shen, Yan Xu, Lifeng Shang, Ming Zhang

arXiv · 2604.15839

The Takeaway

A massive gap exists between an AI's ability to know a mathematical truth and its ability to logically prove it in Lean 4. Formal provers struggle to construct proofs even when the underlying LLM already has the correct final answer. This suggests that AI intuition develops much faster than the capacity for rigorous, step-by-step verification. The model understands the destination of a complex mathematical problem without knowing the path to get there. Bridging this gap is the next major hurdle for creating AI that can genuinely assist in scientific discovery.

From the abstract

Most ATP benchmarks embed the final answer within the formal statement -- a convention we call "Easy Mode" -- a design that simplifies the task relative to what human competitors face and may lead to optimistic estimates of model capability. We call the stricter, more realistic setting "Hard Mode": the system must independently discover the answer before constructing a formal proof. To enable Hard Mode research, we make two contributions. First, we release MiniF2F-Hard and FIMO-Hard, expert-rean