AI & ML Efficiency Breakthrough

DAPA speeds up GELU computation by 16x and reduces hardware DSP utilization by 16x for on-device Transformer deployment.

March 23, 2026

Original Paper

DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training

Maoyang Xiang, Bo Wang

arXiv · 2603.19338

The Takeaway

Traditional piecewise linear approximations lose accuracy in high-probability regions; DAPA uses a differentiable, non-uniform piecewise approach that preserves Transformer performance while drastically reducing the cost of non-linear activations.

From the abstract

Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources but also impose a significant impact on system performance and energy efficiency. In this work, we propose Distribution-Aware Piecewise Activation (DAPA), a differentiable and hardware-friendly activation function for Transformer architectures by exploiting the distribution of pre-activation data. DAPA employs a non-uniform piecewise approximation that