AI & ML Efficiency Breakthrough

A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.

March 27, 2026

Original Paper

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Bocheng Zou, Mu Cai, Mark Stanley, Dingfu Lu, Yong Jae Lee

arXiv · 2603.25744

The Takeaway

It allows frozen VFMs (like DINOv2) to leverage complementary inductive biases from multiple resolutions at inference time. This simple 'plug-and-play' strategy improves global recognition and fine-grained refinement simultaneously without needing expensive retraining or fine-tuning.

From the abstract

Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale. This prevalent single-scale paradigm overlooks a fundamental property of visual perception: varying resolutions offer complementary inductive biases, where low-resolution views excel at global seman