EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.
March 31, 2026
Original Paper
EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation
arXiv · 2603.28405
The Takeaway
The framework systematically prunes structural redundancies specifically taxing for mobile data-flows on Apple and Qualcomm hardware. It establishes a new Pareto-optimal frontier for on-device image generation, enabling private and offline high-fidelity synthesis.
From the abstract
Diffusion Transformers (DiT) have established a new state-of-the-art in high-fidelity image synthesis; however, their massive computational complexity and memory requirements hinder local deployment on resource-constrained edge devices. In this paper, we introduce EdgeDiT, a family of hardware-efficient generative transformers specifically engineered for mobile Neural Processing Units (NPUs), such as the Qualcomm Hexagon and Apple Neural Engine (ANE). By leveraging a hardware-aware optimization