BotBeat
...
← Back

> ▌

ByteDanceByteDance
RESEARCHByteDance2026-03-05

ByteDance Unveils CUDA Agent: RL System Achieves 2.11x Speedup Over PyTorch Compiler

Key Takeaways

  • ▸CUDA Agent achieves 2.11x average speedup over PyTorch's torch.compile with 98.8% pass rate on KernelBench benchmark
  • ▸System combines scalable data synthesis (6K training tasks), skill-augmented environment, and multi-stage RL training with anti-reward-hacking controls
  • ▸Open-source release includes CUDA-Agent-Ops-6K dataset on Hugging Face and complete agent workflow on GitHub
Source:
Hacker Newshttps://cuda-agent.github.io/↗

Summary

ByteDance Seed and Tsinghua University's AIR have released CUDA Agent, a large-scale agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels. The system combines scalable data synthesis, a skill-augmented execution environment, and stable long-horizon RL training to achieve state-of-the-art performance on the KernelBench benchmark. CUDA Agent demonstrates a 98.8% overall pass rate and delivers 2.11x average speedup compared to PyTorch's torch.compile, with 96.8% of generated kernels running faster than the baseline compiler.

The system introduces a three-stage data pipeline that synthesizes 6,000 high-quality training tasks (CUDA-Agent-Ops-6K dataset) by mining seed operators from PyTorch and Transformers libraries, combining them into fused operations, and filtering through execution-driven validation. The agent operates in a ReAct-style workflow with coding tools and anti-reward-hacking controls, requiring generated kernels to pass correctness checks across multiple inputs and achieve at least 5% speedup over torch.compile. Training uses a staged approach with single-turn PPO warm-up followed by multi-turn agentic RL with Rejection Fine-Tuning.

On KernelBench's hierarchical evaluation, CUDA Agent achieved 100% faster-than-compile rates on both Level-1 and Level-2 splits, and 92% on the challenging Level-3 split, outperforming strong proprietary models. The research team has open-sourced both the training dataset on Hugging Face and the agent workflow on GitHub, enabling reproducible research in RL-based GPU kernel optimization. This work addresses a critical bottleneck in deep learning infrastructure by automating a task that traditionally requires deep hardware expertise.

  • Achieves 100% faster-than-compile rate on KernelBench Level-1 and Level-2, and 92% on hardest Level-3 split
  • Automates GPU kernel optimization that traditionally requires specialized hardware expertise

Editorial Opinion

CUDA Agent represents a significant step toward democratizing GPU kernel optimization through AI, potentially accelerating the development cycle for high-performance deep learning systems. The system's careful design of anti-reward-hacking measures and robust verification protocols suggests the research team has learned from challenges in code generation RL. However, the real test will be whether these synthesized kernels generalize to production workloads beyond benchmark tasks, and whether the 2.11x speedup justifies the computational cost of RL training for organizations without ByteDance's infrastructure scale.

Reinforcement LearningMachine LearningMLOps & InfrastructureAI HardwareOpen Source

More from ByteDance

ByteDanceByteDance
INDUSTRY REPORT

As Oscars Approach, Hollywood Grapples with AI's Growing Influence on Filmmaking

2026-03-15
ByteDanceByteDance
PRODUCT LAUNCH

DeerFlow 2.0 Becomes #1 on GitHub Trending as Open-Source Super Agent Platform Launches Ground-Up Rewrite

2026-03-15

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us