VISTA decouples hypothesis generation from prompt rewriting to escape the local optima and black-box nature of current automatic prompt optimizers.
arXiv · March 20, 2026 · 2603.18388
The Takeaway
It transforms prompt optimization from a 'shot in the dark' into an interpretable trace with semantic labels. This allows it to recover performance even from defective seed prompts where standard methods like GEPA degrade model accuracy.
From the abstract
Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propos