AI & ML Paradigm Shift

LLMs can solve planning problems with state spaces as large as 10^165 by acting as program generators rather than direct planners.

March 26, 2026

Original Paper

Language Model Planners do not Scale, but do Formalizers?

Owen Jiang, Cassie Huang, Ashish Sabharwal, Li Zhang

arXiv · 2603.23844

The Takeaway

It demonstrates a massive scaling disparity: while LLM planners fail on complex logic, 'higher-order formalizers' (LLMs that generate code to generate solvers) can handle combinatorial explosions. This shifts the focus from 'better reasoning' to 'better formalization' in AI planning.

From the abstract

Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smalle