BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-04-16

AI Multi-Agent System Achieves 38% GPU Kernel Speedup in Collaboration with NVIDIA

Key Takeaways

  • ▸A multi-agent system achieved a 38% geometric mean speedup on 235 CUDA kernel optimization problems, accomplishing in weeks what typically requires months or years of specialized engineer work
  • ▸The autonomous system independently learned to use NVIDIA's SOL-ExecBench benchmarking pipeline, creating self-directed testing and optimization loops without human intervention
  • ▸This breakthrough demonstrates multi-agent systems' capacity to explore broader solution spaces beyond manual, piecemeal optimization approaches, unlocking performance gains across entire systems
Source:
Hacker Newshttps://cursor.com/blog/multi-agent-kernels↗

Summary

Anthropic has demonstrated a significant breakthrough in GPU optimization by deploying a multi-agent system that achieved a 38% geometric mean speedup on CUDA kernel optimization tasks in collaboration with NVIDIA. Operating autonomously over three weeks, the system successfully optimized 235 GPU kernels from Blackwell processors, working directly with production models from leading AI companies including Deepseek, Qwen, Gemma, and Stable Diffusion. This level of performance improvement typically requires months or years of work from highly experienced kernel engineers, making the achievement a notable validation of multi-agent system capabilities.

The multi-agent system employed a planner agent that coordinated autonomous workers to distribute and rebalance optimization work based on performance metrics. Notably, the system independently learned to call NVIDIA's SOL-ExecBench benchmarking pipeline, creating an automated loop where kernels were continuously tested, debugged, and optimized without developer intervention. The coordination protocol was specified entirely in a single markdown file, demonstrating the system's ability to interpret and execute complex technical instructions autonomously.

The experiment tested the multi-agent system's ability to explore solution spaces beyond traditional manual kernel optimization approaches, which typically optimize individual math operations separately rather than across entire systems. By working at multiple abstraction levels—from CUDA C with inline PTX to higher-level languages—the system addressed a long-tail of kernel optimization problems that had previously been impractical to solve with existing approaches, potentially enabling providers to serve larger, more capable AI models with reduced latency and cost.

  • Faster GPU kernels directly translate to improved GPU utilization, reduced energy consumption, lower latency, and reduced cost-per-token for AI model serving at scale

Editorial Opinion

This achievement represents a compelling demonstration of multi-agent systems' potential in solving complex, open-ended technical problems that have long resisted automation. The ability to autonomously optimize GPU kernels at scale could have profound implications for AI infrastructure efficiency and accessibility, particularly as model serving costs become increasingly critical to AI industry economics. However, the reliance on proprietary benchmarking and controlled experimental conditions warrants independent verification of these results in broader production environments.

Reinforcement LearningAI AgentsMachine LearningAI Hardware

More from Anthropic

AnthropicAnthropic
PARTNERSHIP

White House Pushes US Agencies to Adopt Anthropic's AI Technology

2026-04-17
AnthropicAnthropic
RESEARCH

AI Safety Convergence: Three Major Players Deploy Agent Governance Systems Within Weeks

2026-04-17
AnthropicAnthropic
PRODUCT LAUNCH

Finance Leaders Sound Alarm as Anthropic's Claude Mythos Expands to UK Banks

2026-04-17

Comments

Suggested

OpenAIOpenAI
RESEARCH

OpenAI's GPT-5.4 Pro Solves Longstanding Erdős Math Problem, Reveals Novel Mathematical Connections

2026-04-17
AnthropicAnthropic
RESEARCH

AI Safety Convergence: Three Major Players Deploy Agent Governance Systems Within Weeks

2026-04-17
CloudflareCloudflare
UPDATE

Cloudflare Enables AI-Generated Apps to Have Persistent Storage with Durable Objects in Dynamic Workers

2026-04-17
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us