Reframes GPU kernel optimization by benchmarking against hardware 'Speed-of-Light' limits rather than software baselines.
March 20, 2026
Original Paper
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
arXiv · 2603.19173
The Takeaway
As AI agents begin to write CUDA kernels, measuring speedup over existing code is a moving target; this benchmark provides absolute hardware-grounded goals. It includes a sandboxed harness and clock-locking to prevent reward-hacking, essential for training next-generation LLM-based kernel optimizers.
From the abstract
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present SOL-ExecBench, a benchmark of 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language, diffusion, vision, audio, video, and hybrid architectures, targeting NVIDIA Blackwell GPUs. The benchmark covers forward