AI & ML Nature Is Weird

There is a hard physical ceiling on what images can tell us about our environment.

April 14, 2026

Original Paper

From Pixels to UTCI: A Zero-Shot Framework for Predicting Outdoor Thermal Comfort from Street View Images Using Vision-Language Models

Luo Yanhuo

SSRN · 6570820

The Takeaway

Despite AI hype, Vision-Language Models fail to predict outdoor thermal comfort because the necessary meteorological variables are literally unobservable in pixels. This is a definitive 'negative result' against the assumption that more data equals better sensing.

From the abstract

Street-level prediction of the Universal Thermal Climate Index (UTCI) typically requires microclimate simulations or dense sensor networks, both costly and difficult to scale. This study proposes a zero-shot framework in which Vision-Language Models (VLMs) predict UTCI directly from street-view images without task-specific training. Two VLMs — Gemini 2.5 Flash (commercial) and LLaVA 1.6 7B (open-source) — are evaluated across four cities in three Köppen climate zones (600 images, 10,800 inferenc