LLMs can solve planning problems with state spaces as large as 10^165 by acting as program generators rather than direct planners.
March 26, 2026
Original Paper
Language Model Planners do not Scale, but do Formalizers?
arXiv · 2603.23844
The Takeaway
It demonstrates a massive scaling disparity: while LLM planners fail on complex logic, 'higher-order formalizers' (LLMs that generate code to generate solvers) can handle combinatorial explosions. This shifts the focus from 'better reasoning' to 'better formalization' in AI planning.
From the abstract
Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smalle