SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.
arXiv · March 16, 2026 · 2603.12724
Why it matters
It shifts the focus from simple scientific VQA to practical inverse design (finding inputs to match desired physical outcomes). The introduction of Reinforcement Learning from Simulator Feedback (RLSF) shows a clear path forward for training LLMs to solve high-stakes engineering problems in chemistry and physics.
From the abstract
Many of the most important problems in science and engineering are inverse problems: given a desired outcome, find a design that achieves it. Evaluating whether a candidate meets the spec is often routine; a binding energy can be computed, a reactor yield simulated, a pharmacokinetic profile predicted. But searching a combinatorial design space for inputs that satisfy those targets is fundamentally harder. We introduce SciDesignBench, a benchmark of 520 simulator-grounded tasks across 14 scienti