Analyses over 10,000 experiments to prove that LLM agents are capable of genuine architectural discovery rather than just hyperparameter tuning.
March 18, 2026
Original Paper
Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments
arXiv · 2603.15916
The Takeaway
The study validates that autonomous agents can identify novel architectural configurations (e.g., V-JEPA with Zipformer) that outperform human-proposed baselines. This provides the first large-scale empirical proof that LLMs can act as effective, non-trivial research assistants.
From the abstract
When LLM agents autonomously design ML experiments, do they perform genuine architecture search -- or do they default to hyperparameter tuning within a narrow region of the design space? We answer this question by analyzing 10,469 experiments executed by two LLM agents (Claude Opus and Gemini 2.5 Pro) across a combinatorial configuration space of 108,000 discrete cells for dashcam collision detection over 27 days. Through ANOVA decomposition, we find that \textbf{architectural choices explain 94