UC Berkeley ADRS Project Explores Memory Management for AI-Driven GPU Code Generation
Key Takeaways
- ▸Memory in optimization agents should be managed like a cache, not a notebook—utility is measured by whether retrieval saves more search than it costs in context
- ▸MakoraGenerate demonstrates agentic code generation at scale, generating GPU kernels across multiple accelerator platforms with automated correctness validation and performance benchmarking
- ▸The research establishes kernel generation as a verifiable optimization problem that can drive agent improvement through iterative search and measured rewards
Summary
Researchers from UC Berkeley's AI-Driven Research for Systems (ADRS) project have published insights into how agentic systems can more effectively manage memory when generating GPU kernels. The team developed MakoraGenerate, a multi-agent evolutionary system that generates, compiles, validates, and benchmarks GPU kernels across NVIDIA, AMD, TPU, and NPU architectures. Rather than treating memory as a simple long-term recall mechanism, the researchers argue that for optimization agents under strict compute budgets, memory should function more like a cache—where retrieved information must justify its inclusion by avoiding costly rediscovery of coding patterns.
The core challenge the ADRS team addresses is a fundamental trade-off in agentic systems: every element competing for context space (current code, compiler errors, profiler output, documentation, prior kernels) comes at a cost. In GPU kernel generation, an iterative search problem where each candidate must be evaluated against multiple criteria, adding memory that crowds out locally relevant evidence actually harms performance. The researchers propose that the key question isn't how much memory an agent can access, but rather what belongs in the agent's working set at each step—treating memory management as a cache optimization problem rather than a knowledge storage problem.
MakoraGenerate instantiates this philosophy by pairing an LLM with an automated feedback loop: the agent proposes candidates, the system validates correctness against PyTorch reference implementations, profiles runtime performance, and uses measured speedup as reward signal. The system maintains a ranked population and applies diversity-based selection to inherit effective patterns while avoiding premature convergence. This architecture demonstrates that for optimization agents with hard per-step budget constraints and verifiable objectives, sophisticated memory management can be as important as the underlying model.
- For agents operating under strict compute budgets, aggressive working-set management may be more valuable than maximizing access to stored experience
Editorial Opinion
This work addresses a subtle but critical challenge in building practical agentic systems: not all information is equally valuable in context, and poorly managed memory can actively harm reasoning and search efficiency. The framing of memory as cache-style context management—rather than long-term recall—offers a valuable conceptual shift for practitioners building optimization agents. By pairing rigorous automated evaluation with agentic search, the ADRS project demonstrates a path toward using AI to accelerate systems research in domains with clear verification criteria.



