AI & ML Paradigm Shift

Analyses over 10,000 experiments to prove that LLM agents are capable of genuine architectural discovery rather than just hyperparameter tuning.

March 18, 2026

Original Paper

Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

Xiaoyi Li

arXiv · 2603.15916

The Takeaway

The study validates that autonomous agents can identify novel architectural configurations (e.g., V-JEPA with Zipformer) that outperform human-proposed baselines. This provides the first large-scale empirical proof that LLMs can act as effective, non-trivial research assistants.

From the abstract

When LLM agents autonomously design ML experiments, do they perform genuine architecture search -- or do they default to hyperparameter tuning within a narrow region of the design space? We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5 Pro) across a combinatorial configuration space of 108,000 discrete cells for dashcam collision detection over 27 days. Through ANOVA decomposition, we find that \textbf{architectural choices explain 94