AI & ML Efficiency Breakthrough

AwaRes enables low-resolution Vision-Language Models to retrieve only the high-resolution image crops needed for a specific query via tool-calling.

March 19, 2026

Original Paper

Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

Nimrod Shabtay, Moshe Kimhi, Artem Spector, Sivan Haray, Ehud Rivlin, Chaim Baskin, Raja Giryes, Eli Schwartz

arXiv · 2603.16932

The Takeaway

This resolves the standard accuracy-efficiency trade-off in VLMs by only processing high-detail visual segments on demand. It uses GRPO training to teach models when and where to look, allowing small models to handle high-resolution tasks (like reading small text) at a fraction of the compute cost.

From the abstract

Vision-language models (VLMs) typically process images at a native high-resolution, forcing a trade-off between accuracy and computational efficiency: high-resolution inputs capture fine details but incur significant computational costs, while low-resolution inputs advocate for efficiency, they potentially miss critical visual information, like small text. We present AwaRes, a spatial-on-demand framework that resolves this accuracy-efficiency trade-off by operating on a low-resolution global vie