We swapped out a piece of an AI’s digital brain for actual light, which lets it think at the literal speed of optics.
By moving the Softmax function from digital chips to analog lithium niobate modulators, this research could radically slash the latency of AI models. It’s a major step toward hardware that computes as fast as light can travel through it.
Integrated electro-optic attention nonlinearities for transformers
arXiv · 2604.09512
Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision. At the core of these models lies the attention mechanism, which requires a nonlinear, non-negative mapping using the Softmax function. However, although Softmax operations account for less than 1% of the total operation count, they can disproportionately bottleneck overall inference latency. Here, we use thin-film lithium niobate (TFLN) Mach-Ze