BotBeat
...
← Back

> ▌

UC BerkeleyUC Berkeley
RESEARCHUC Berkeley2026-02-26

K-Search: New AI Framework Achieves 14x Speedup in GPU Kernel Optimization

Key Takeaways

  • ▸K-Search uses a co-evolving world model to guide LLMs in optimizing GPU kernels, achieving up to 14.3x speedup on complex kernels
  • ▸The framework decouples high-level algorithmic planning from low-level implementation, enabling exploration of non-monotonic optimization paths
  • ▸K-Search outperforms both state-of-the-art automated methods and human-designed solutions on the GPUMode TriMul benchmark for H100 GPUs
Source:
Hacker Newshttps://arxiv.org/abs/2602.19128↗

Summary

Researchers from UC Berkeley and affiliates have introduced K-Search, a novel framework that uses large language models to automatically optimize GPU kernels with unprecedented efficiency. The system addresses a critical bottleneck in machine learning infrastructure by treating LLMs not merely as code generators, but as strategic planners that can navigate complex optimization paths. Traditional automated approaches struggle with multi-step structural transformations and often discard promising strategies due to temporary implementation flaws.

K-Search's innovation lies in its "co-evolving world model" that replaces static search heuristics with dynamic, LLM-guided exploration. This approach explicitly separates high-level algorithmic planning from low-level code implementation, allowing the system to pursue non-monotonic optimization paths while remaining resilient to intermediate bugs or inefficiencies. The framework leverages the domain knowledge encoded in LLMs to actively explore the optimization space rather than relying on rigid heuristics.

Benchmark results demonstrate substantial performance gains across diverse kernel types. On complex kernels from FlashInfer—including Group Query Attention (GQA), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE)—K-Search achieved an average 2.10x improvement over state-of-the-art evolutionary search methods, with peak gains reaching 14.3x on MoE kernels. The system also achieved state-of-the-art performance on the GPUMode TriMul task for H100 GPUs, reaching 1030 microseconds and surpassing both previous automated solutions and human-designed implementations.

This research represents a significant step toward automating kernel optimization, a traditionally expert-intensive process that becomes increasingly critical as GPU architectures evolve rapidly. By enabling LLMs to reason about optimization strategies at a higher level of abstraction, K-Search could accelerate development cycles for machine learning systems and reduce the expertise barrier for achieving peak hardware performance.

  • The approach addresses a critical limitation of existing methods that treat LLMs as simple code generators within heuristic-guided loops

Editorial Opinion

K-Search represents a fascinating evolution in how we leverage LLMs for systems optimization—moving from pattern-matching code generation to strategic reasoning about performance trade-offs. The ability to maintain promising optimization trajectories despite temporary implementation failures mirrors how human experts approach kernel tuning, suggesting we're approaching genuinely intelligent automated optimization. The 14x gains on complex kernels aren't just incremental improvements; they indicate the framework is discovering optimization strategies that differ fundamentally from both evolutionary search and human intuition, which could reshape how we think about the boundary between automated and expert-driven performance engineering.

Large Language Models (LLMs)Machine LearningMLOps & InfrastructureAI HardwareScience & Research

More from UC Berkeley

UC BerkeleyUC Berkeley
RESEARCH

UC Berkeley and Stanford Researchers Unveil Framework for Understanding Language Model Generalization Dynamics

2026-05-20
UC BerkeleyUC Berkeley
UPDATE

vLLM Extends Disaggregated Serving to Hybrid SSM-FA Models

2026-04-28

Comments

Suggested

MicrosoftMicrosoft
UPDATE

GitHub Copilot Shifts to Usage-Based Billing Starting June 1, 2026

2026-05-20
AnthropicAnthropic
PARTNERSHIP

Anthropic Expands Partnership with SpaceX, Scales GB200 Capacity in Colossus 2

2026-05-20
Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us