A unified quantization and runtime framework for deploying multiple LoRA-adapted generative models on edge devices simultaneously.
April 1, 2026
Original Paper
Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge
arXiv · 2603.29535
The Takeaway
It enables 'one-for-all' generative vision models on mobile NPUs by treating LoRA weights as runtime inputs and aligning them under a shared quantization profile. This reduces memory footprint by up to 6x, allowing complex multi-task image editing tools to run locally on resource-constrained hardware.
From the abstract
Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memory and compute requirements. While Low-Rank Adapters (LoRAs) enable parameter-efficient task adaptation, existing Mobile deployment pipelines typically compile separate model binari