AFS-Search introduces a training-free closed-loop framework to solve spatial grounding errors in diffusion models like FLUX.1.
arXiv · March 20, 2026 · 2603.18627
The Takeaway
It uses a VLM as a 'semantic critic' to perform parallel lookahead rollouts during the generation process, dynamically steering the flow to match spatial prompts. This achieves SOTA spatial control without needing to fine-tune or use ControlNets.
From the abstract
Precise Text-to-Image (T2I) generation has achieved great success but is hindered by the limited relational reasoning of static text encoders and the error accumulation in open-loop sampling. Without real-time feedback, initial semantic ambiguities during the Ordinary Differential Equation trajectory inevitably escalate into stochastic deviations from spatial constraints. To bridge this gap, we introduce AFS-Search (Agentic Flow Steering and Parallel Rollout Search), a training-free closed-loop