Vision-Language Models can now be backdoored to literally control where a human looks on their screen.
April 15, 2026
Original Paper
Follow My Eyes: Backdoor Attacks on VLM-based Scanpath Prediction
arXiv · 2604.08766
The Takeaway
As VLMs are increasingly used to predict human gaze (scanpaths) for UI/UX and accessibility, they become a high-value target for manipulation. This paper demonstrates a backdoor attack that can redirect a user's attention to specific objects or subtly delay their search process without triggering any typical defenses. These attacks are virtually undetectable because they don't break the model's functionality—they just nudge the 'eye.' This reveals a terrifying new frontier of psychological social engineering where AI can subtly hijack human visual attention in real-time.
From the abstract
Scanpath prediction models forecast the sequence and timing of human fixations during visual search, driving foveated rendering and attention-based interaction in mobile systems where their integrity is a first-class security concern. We present the first study of backdoor attacks against VLM-based scanpath prediction, evaluated on GazeFormer and COCO-Search18. We show that naive fixed-path attacks, while effective, create detectable clustering in the continuous output space. To overcome this, w