Physics Practical Magic

A new light-based processor can scan an entire library’s worth of AI memory just as fast as it scans a single page.

March 24, 2026

Original Paper

PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection

Hyoseok Park, Yeonsang Park

arXiv · 2603.21576

The Takeaway

AI models currently slow down and consume massive amounts of energy because they must scan their memory word-by-word to find context. This hardware uses photons to perform this search instantly and at a fixed speed, breaking the physical 'memory wall' that has historically limited the length of AI conversations.

From the abstract

Long-context LLM inference is bottlenecked not by compute but by the O(n) memory bandwidth cost of scanning the KV cache at every decode step -- a wall that no amount of arithmetic scaling can break. Recent photonic accelerators have demonstrated impressive throughput for dense attention computation; however, these approaches inherit the same O(n) memory scaling as electronic attention when applied to long contexts. We observe that the real leverage point is the coarse block-selection step: a me