Security Researchers Demonstrate 'Brainworm' — First Natural Language Malware Targeting AI Agents

Key Takeaways

▸Brainworm is the first demonstrated malware that exists entirely as natural language instructions within AI agent memory, requiring no binary executables or traditional code
▸The attack exploits auto-loading memory files (CLAUDE.md, AGENTS.md) in AI coding assistants from Anthropic, OpenAI, and Google, forcing agents to perform unauthorized actions
▸Traditional endpoint security tools are ineffective against this threat because they rely on detecting structured artifacts like executables, scripts, or behavioral patterns—none of which exist in promptware attacks

Source:

Hacker Newshttps://www.originhq.com/blog/brainworm↗

Summary

Security researchers at Origin Next Generation Endpoint Security have unveiled Brainworm, a novel form of malware that exists entirely within the context window of AI coding agents like Claude Code, requiring no traditional executable code. The "promptware" exploits memory files (CLAUDE.md, AGENTS.md) that AI coding assistants automatically load, injecting natural language instructions that manipulate agent behavior and enable command-and-control operations through a framework called Praxis.

Brainworm represents a fundamental shift in the malware landscape, operating purely through semantic manipulation rather than binary executables or scripts. The researchers demonstrated that by modifying these memory files, they could force AI agents to perform unauthorized tool calls and receive tasking instructions—all without leaving traditional forensic artifacts that security tools are designed to detect. The attack exploits auto-memory features recently introduced by Anthropic's Claude Code, which automatically creates and loads memory files on behalf of users.

The research team, led by Mitchell Turner, explicitly frames this as a necessary exploration of the "promptware kill chain" for endpoint security, drawing parallels to the 1971 Creeper worm that infected ARPANET mainframes. While the current demonstration focuses on persistence and command-and-control phases, the researchers indicate plans to reveal a complete attack chain in future posts. The work highlights a critical blind spot in current endpoint security paradigms, which rely on manifest-defined fields and structured data analysis—methods ineffective against natural language threats.

Origin has open-sourced Praxis, their adversarial command-and-control framework designed to "discover, control and orchestrate computer-use agents across endpoints." The disclosure raises urgent questions about securing AI agents that function as natural language scripting interpreters, where traditional security boundaries between code and data have effectively dissolved.

Origin has open-sourced Praxis, a command-and-control framework for AI agents, and plans to demonstrate a full "promptware kill chain" in future research

Editorial Opinion

Brainworm represents a watershed moment in cybersecurity—the first serious demonstration that AI agents introduce attack surfaces that fundamentally bypass 50 years of security architecture. The research brilliantly exposes how computer-use agents have created a new class of interpreter that executes natural language as code, erasing the traditional boundary that security tools depend on. While Origin's responsible disclosure approach is commendable, the open-sourcing of Praxis before defensive solutions exist is concerning, potentially accelerating adversarial development. The cybersecurity community now faces an urgent imperative to develop semantic threat detection capabilities before promptware attacks move from research demonstrations to real-world exploitation.

Security Researchers Demonstrate 'Brainworm' — First Natural Language Malware Targeting AI Agents

Key Takeaways

▸Brainworm is the first demonstrated malware that exists entirely as natural language instructions within AI agent memory, requiring no binary executables or traditional code
▸The attack exploits auto-loading memory files (CLAUDE.md, AGENTS.md) in AI coding assistants from Anthropic, OpenAI, and Google, forcing agents to perform unauthorized actions
▸Traditional endpoint security tools are ineffective against this threat because they rely on detecting structured artifacts like executables, scripts, or behavioral patterns—none of which exist in promptware attacks

Summary

Origin has open-sourced Praxis, a command-and-control framework for AI agents, and plans to demonstrate a full "promptware kill chain" in future research

Editorial Opinion

Brainworm represents a watershed moment in cybersecurity—the first serious demonstration that AI agents introduce attack surfaces that fundamentally bypass 50 years of security architecture. The research brilliantly exposes how computer-use agents have created a new class of interpreter that executes natural language as code, erasing the traditional boundary that security tools depend on. While Origin's responsible disclosure approach is commendable, the open-sourcing of Praxis before defensive solutions exist is concerning, potentially accelerating adversarial development. The cybersecurity community now faces an urgent imperative to develop semantic threat detection capabilities before promptware attacks move from research demonstrations to real-world exploitation.

Security Researchers Demonstrate 'Brainworm' — First Natural Language Malware Targeting AI Agents

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Security Researchers Demonstrate 'Brainworm' — First Natural Language Malware Targeting AI Agents

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains