A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.
March 27, 2026
Original Paper
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models
arXiv · 2603.25744
The Takeaway
It allows frozen VFMs (like DINOv2) to leverage complementary inductive biases from multiple resolutions at inference time. This simple 'plug-and-play' strategy improves global recognition and fine-grained refinement simultaneously without needing expensive retraining or fine-tuning.
From the abstract
Vision Foundation Models (VFMs) have become the cornerstone of modern computer vision, offering robust representations across a wide array of tasks. While recent advances allow these models to handle varying input sizes during training, inference typically remains restricted to a single, fixed scale. This prevalent single-scale paradigm overlooks a fundamental property of visual perception: varying resolutions offer complementary inductive biases, where low-resolution views excel at global seman