AI & ML Nature Is Weird

Giving an AI more time to 'think' can actually make it give you a stupider answer.

April 14, 2026

Original Paper

When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

Shu Zhou, Rui Ling, Junan Chen, Xin Wang, Tao Fan, Hao Wang

arXiv · 2604.10739

The Takeaway

Increasing test-time compute scaling leads to 'overthinking,' where the model abandons correct initial intuitions for incorrect rationalizations. This challenges the industry dogma that more Chain-of-Thought always equals better reasoning.

From the abstract

Scaling test-time compute through extended chains of thought has become a dominant paradigm for improving large language model reasoning. However, existing research implicitly assumes that longer thinking always yields better results. This assumption remains largely unexamined. We systematically investigate how the marginal utility of additional reasoning tokens changes as compute budgets increase. We find that marginal returns diminish substantially at higher budgets and that models exhibit ``o