Meta Introduces KernelEvolve: AI Agent That Optimizes Hardware Kernels 60% Faster Than Human Engineers
Key Takeaways
- ▸KernelEvolve reduces kernel optimization time from weeks to hours by automating the search and refinement process using agentic AI
- ▸The system achieved 60% inference throughput improvement on NVIDIA GPUs and 25% training improvement on Meta's MTIA chips, outperforming human expert-optimized kernels
- ▸The technology supports optimization across heterogeneous hardware (NVIDIA, AMD, MTIA) and multiple programming languages (Triton, CUDA, HIP, C++), addressing Meta's infrastructure diversity
Summary
Meta has unveiled KernelEvolve, an autonomous AI agent system that optimizes low-level hardware kernels for diverse AI accelerators including NVIDIA GPUs, AMD GPUs, and Meta's custom MTIA chips. The system treats kernel optimization as a search problem, using an LLM-driven continuous search process to automatically generate and refine production-grade kernels across multiple hardware platforms and programming languages.
As part of Meta's broader Ranking Engineer Agent framework, KernelEvolve dramatically accelerates infrastructure optimization work that traditionally required weeks of manual engineering effort. The system achieved a 60% inference throughput improvement for Meta's Andromeda Ads model on NVIDIA GPUs and over 25% training throughput improvement for an ads model on MTIA chips, completing optimizations in hours rather than weeks.
The technology addresses a critical scaling challenge for AI companies: as the number of AI models and hardware variants multiplies, manual kernel tuning by expert engineers becomes infeasible. KernelEvolve generates kernels in multiple languages including high-level DSLs like Triton and low-level languages like CUDA and HIP, making it broadly applicable across Meta's heterogeneous infrastructure. The research will be presented at ISCA 2026.
- Part of Meta's Ranking Engineer Agent ecosystem, KernelEvolve demonstrates how autonomous agents can solve infrastructure bottlenecks that constrain AI model deployment and iteration
Editorial Opinion
KernelEvolve represents a significant leap in applying AI to infrastructure optimization—using agents to solve the very real bottleneck of kernel tuning across diverse hardware. The 60% throughput improvements are impressive, but the real value lies in freeing expert engineers from weeks of repetitive optimization work, allowing them to focus on higher-level innovation. As AI accelerator diversity increases (NVIDIA, AMD, custom chips), agentic solutions like this may become essential infrastructure, not optional tools.



