AI & ML Efficiency Breakthrough

An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.

March 31, 2026

Original Paper

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization

He Du, Qiming Ge, Jiakai Hu, Aijun Yang, Zheng Cai, Zixian Huang, Sheng Yuan, Qinxiu Cheng, Xinchen Xie, Yicheng Chen, Yining Li, Jiaxing Xie, Huanan Dong, Yaguang Wu, Xiangjun Huang, Jian Yang, Hui Wang, Bowen Zhou, Bowen Li, Qipeng Guo, Kai Chen

arXiv · 2603.28342

The Takeaway

Kernel-Smith uses an evolution-in-the-loop agent to generate high-performance Triton code, beating the best proprietary LLMs at low-level operator optimization. This is a significant step toward automated, cross-platform hardware acceleration and high-utilization ML infra.

From the abstract

We present Kernel-Smith, a framework for high-performance GPU kernel and operator generation that combines a stable evaluation-driven evolutionary agent with an evolution-oriented post-training recipe. On the agent side, Kernel-Smith maintains a population of executable candidates and iteratively improves them using an archive of top-performing and diverse programs together with structured execution feedback on compilation, correctness, and speedup. To make this search reliable, we build backend