SOL-ExecBench: New Benchmark Measures GPU Kernel Optimization Against Hardware Limits Rather Than Software Baselines
Key Takeaways
- ▸SOL-ExecBench introduces Speed-of-Light benchmarking for GPU kernels, measuring performance against hardware efficiency bounds rather than mutable software baselines
- ▸The benchmark covers 235 CUDA kernel optimization problems from 124 production AI models across diverse architectures and precision formats targeting NVIDIA Blackwell GPUs
- ▸Includes anti-gaming measures and sandboxed evaluation harness to support robust assessment of agentic AI kernel optimizers
Summary
Researchers have introduced SOL-ExecBench, a comprehensive benchmark for evaluating GPU kernel optimization that shifts evaluation methodology from comparing against software baselines to measuring performance against analytically derived Speed-of-Light (SOL) hardware efficiency bounds. The benchmark comprises 235 CUDA kernel optimization problems extracted from 124 production and emerging AI models spanning language models, diffusion, vision, audio, video, and hybrid architectures, all targeting NVIDIA's Blackwell GPUs. It covers forward and backward workloads across multiple precision formats including BF16, FP8, and NVFP4, with kernels designed to leverage Blackwell-specific capabilities.
The benchmark introduces a SOL Score metric that quantifies how much of the gap between a baseline and hardware Speed-of-Light bounds a candidate kernel closes, providing a fixed target for truly hardware-efficient optimization. To ensure robust evaluation of agentic AI systems that generate and optimize kernels, the benchmark includes a sandboxed harness with GPU clock locking, L2 cache clearing, isolated subprocess execution, and static analysis checks against reward-hacking strategies. This represents a fundamental reframing of GPU kernel benchmarking from a relative comparison problem to an absolute measure of proximity to theoretical hardware efficiency limits.
- Provides a fixed, analytically-derived target for hardware-efficient optimization rather than relative speedup metrics
Editorial Opinion
SOL-ExecBench addresses a critical gap in how AI kernel optimization is evaluated. By shifting from relative software-baseline comparisons to absolute hardware efficiency bounds, this benchmark better aligns incentives for developing genuinely efficient kernels that approach physical hardware limits. As agentic AI systems become more capable at code generation and optimization, having principled, robust benchmarks with anti-gaming protections will be essential for measuring real progress rather than spurious improvements.


