There is a hard physical ceiling on what images can tell us about our environment.
April 14, 2026
Original Paper
From Pixels to UTCI: A Zero-Shot Framework for Predicting Outdoor Thermal Comfort from Street View Images Using Vision-Language Models
SSRN · 6570820
The Takeaway
Despite AI hype, Vision-Language Models fail to predict outdoor thermal comfort because the necessary meteorological variables are literally unobservable in pixels. This is a definitive 'negative result' against the assumption that more data equals better sensing.
From the abstract
Street-level prediction of the Universal Thermal Climate Index (UTCI) typically requires microclimate simulations or dense sensor networks, both costly and difficult to scale. This study proposes a zero-shot framework in which Vision-Language Models (VLMs) predict UTCI directly from street-view images without task-specific training. Two VLMs — Gemini 2.5 Flash (commercial) and LLaVA 1.6 7B (open-source) — are evaluated across four cities in three Köppen climate zones (600 images, 10,800 inferenc