BotBeat
...
← Back

> ▌

DoubleAIDoubleAI
PRODUCT LAUNCHDoubleAI2026-03-02

DoubleAI's WarpSpeed Achieves Up to 100x Speedup on NVIDIA's cuGraph Library Using AI-Powered Optimization

Key Takeaways

  • ▸WarpSpeed achieved 3.6x mean speedup across all cuGraph algorithms, with some reaching 100x improvements, while maintaining 100% correctness
  • ▸The system outperformed leading AI coding assistants (Claude Code, Codex, Gemini CLI) which had nearly 50% failure rates on the same optimization tasks
  • ▸DoubleAI released doubleGraph as a drop-in replacement for NVIDIA cuGraph, requiring no code changes and supporting A100, L4, and A10G GPUs
Source:
Hacker Newshttps://www.doubleai.com/research/doubleais-warpspeed-surpassing-expert-written-kernels-at-scale↗

Summary

DoubleAI has unveiled WarpSpeed, an AI system designed to autonomously optimize GPU performance by rewriting expert-level code. The company demonstrated WarpSpeed's capabilities by completely rewriting NVIDIA's cuGraph library—the world's most widely used GPU-accelerated graph analytics toolkit—creating a drop-in replacement called doubleGraph. The optimized version delivers substantial performance improvements across all algorithms, with 55% achieving speedups above 2x, 18% exceeding 10x, and an overall mean speedup of 3.6x across common cloud GPUs including A100, L4, and A10G.

What distinguishes WarpSpeed from existing AI coding assistants is its perfect correctness rate and superior optimization capability. While baseline systems like Claude Code, Codex, and Gemini CLI failed on nearly half of the algorithms tested, WarpSpeed successfully optimized every single algorithm in cuGraph with verified correctness. The system generated 576 distinct optimized kernels across three GPU architectures, achieving the kind of exhaustive specialization that would be impractical for human engineering teams.

DoubleAI positions WarpSpeed as an "artificial expert intelligence" system designed to surpass human specialists through both superior skill—finding optimizations experts miss—and unprecedented scale. The company has released doubleGraph as a publicly available library that requires no code changes to use. Graph algorithms represent a particularly challenging optimization target due to irregular memory access patterns and complex data dependencies, making WarpSpeed's achievement especially significant for GPU-accelerated computing.

  • WarpSpeed generated 576 specialized kernels across different GPU architectures and configurations, demonstrating optimization scale beyond human engineering teams

Editorial Opinion

WarpSpeed represents a significant milestone in AI-assisted software optimization, particularly its claim of 100% correctness—a critical requirement for production systems that existing LLM coding assistants have struggled to achieve. The decision to tackle graph algorithms, which lack the regular structure of dense workloads like matrix multiplication, demonstrates ambition in choosing a genuinely hard problem rather than optimizing for benchmarks. However, the real test will be whether this approach generalizes beyond cuGraph to other performance-critical libraries, and whether the underlying trillion-parameter model and "time travel" agentic system can maintain perfect correctness as problem complexity scales.

Machine LearningMLOps & InfrastructureAI HardwareProduct LaunchOpen Source

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us