uGen: LLMs Successfully Generate Complex Microarchitectural Attack Code at Scale

Key Takeaways

▸LLMs can reliably generate functionally correct microarchitectural attack code when augmented with domain-specific knowledge and multi-agent coordination
▸Claude Sonnet-4 achieved 100% success rate for Spectre-v1 PoC generation, significantly outperforming other evaluated models
▸uGen reduces attack PoC development to $1.25 and under 4 minutes, versus weeks of manual expertise

Source:

Hacker Newshttps://arxiv.org/abs/2605.15503↗

Summary

Researchers have developed uGen, the first LLM-driven framework for automatically generating microarchitectural attack proof-of-concept (PoC) code. Microarchitectural attacks—which exploit processor vulnerabilities like cache timing and speculative execution—have historically been challenging to develop due to the need for deep expertise, environment-specific tuning, and labor-intensive manual implementation. The new framework addresses this by leveraging large language models to automate code generation, potentially democratizing vulnerability assessment and defensive security research.

uGen employs a retrieval-augmented multi-agent design to overcome knowledge gaps in state-of-the-art LLMs. The research team systematically studied GPT, Claude, and Qwen3, finding that these models frequently misgenerate or misplace critical attack primitives. By injecting domain-specific knowledge through retrieval augmentation and coordinating multiple agents, uGen synthesizes functionally correct microarchitectural attack code tailored to specific processor architectures and defender requirements.

Results are striking: Claude Sonnet-4 achieved a 100% success rate for Spectre-v1 attacks, while Qwen3-Coder reached 80% success on Prime+Probe attacks. The framework generates working PoCs in under four minutes for just $1.25 each—a dramatic reduction in time and cost compared to manual attack development. This efficiency could accelerate large-scale vulnerability assessment but also raises questions about the democratization of attack code generation.

Retrieval-augmented multi-agent frameworks can overcome LLM knowledge gaps in specialized, high-precision domains

Editorial Opinion

This research represents a watershed moment for both defensive security research and dual-use concerns in AI. Automating attack PoC generation could democratize sophisticated vulnerability assessment for resource-constrained defenders, but it equally lowers barriers for malicious actors seeking ready-made exploits. The success of retrieval-augmented multi-agent approaches in solving domain-specific knowledge gaps deserves attention far beyond security—this pattern could reshape how we deploy LLMs in specialized fields requiring high precision. The security community now faces urgent questions about access control and responsible disclosure.

uGen: LLMs Successfully Generate Complex Microarchitectural Attack Code at Scale

Key Takeaways

▸LLMs can reliably generate functionally correct microarchitectural attack code when augmented with domain-specific knowledge and multi-agent coordination
▸Claude Sonnet-4 achieved 100% success rate for Spectre-v1 PoC generation, significantly outperforming other evaluated models
▸uGen reduces attack PoC development to $1.25 and under 4 minutes, versus weeks of manual expertise

Summary

Retrieval-augmented multi-agent frameworks can overcome LLM knowledge gaps in specialized, high-precision domains

Editorial Opinion

This research represents a watershed moment for both defensive security research and dual-use concerns in AI. Automating attack PoC generation could democratize sophisticated vulnerability assessment for resource-constrained defenders, but it equally lowers barriers for malicious actors seeking ready-made exploits. The success of retrieval-augmented multi-agent approaches in solving domain-specific knowledge gaps deserves attention far beyond security—this pattern could reshape how we deploy LLMs in specialized fields requiring high precision. The security community now faces urgent questions about access control and responsible disclosure.

uGen: LLMs Successfully Generate Complex Microarchitectural Attack Code at Scale

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

uGen: LLMs Successfully Generate Complex Microarchitectural Attack Code at Scale

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains