AI & ML New Capability

VISTA decouples hypothesis generation from prompt rewriting to escape the local optima and black-box nature of current automatic prompt optimizers.

arXiv · March 20, 2026 · 2603.18388

Shiyan Liu, Qifeng Xia, Qiyun Xia, Yisheng Liu, Xinyu Yu, Rui Qu

The Takeaway

It transforms prompt optimization from a 'shot in the dark' into an interpretable trace with semantic labels. This allows it to recover performance even from defective seed prompts where standard methods like GEPA degrade model accuracy.

From the abstract

Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propos