ClawSandbox Security Benchmark Reveals 7 of 9 Attacks Succeed Against AI Agents With Shell Access
Key Takeaways
- ▸ClawSandbox demonstrated successful execution of 7 out of 9 security attacks against an AI agent with shell access in its reference case study
- ▸Four major vulnerability classes were identified: prompt injection, memory poisoning, privilege escalation, and data exfiltration—affecting virtually all AI agents with system-level access
- ▸The security issues are fundamental to LLM-based agents rather than framework-specific, putting popular tools like AutoGPT, CrewAI, Claude Code, Cursor, and Devin at potential risk
Summary
A new open-source security benchmark called ClawSandbox has exposed critical vulnerabilities in AI agents with code execution capabilities, successfully demonstrating 7 out of 9 attack vectors in a reference case study using OpenClaw with Google's Gemini 2.5 Flash. The benchmark tests four major attack classes: prompt injection, memory poisoning, privilege escalation, and data exfiltration—vulnerabilities that affect virtually any AI agent with shell access, file system permissions, or persistent memory.
Developed by researcher Ariansyah (deduu on GitHub), ClawSandbox provides a reusable testing framework that can evaluate security weaknesses across different AI agent platforms. The research emphasizes that these vulnerabilities are not specific to any single framework but are inherent to the architecture of LLM-based agents with system-level access. Popular platforms potentially affected include AutoGPT, CrewAI, LangChain Agents, Claude Code, Cursor, Windsurf, Devin, and any custom agents built using the Model Context Protocol (MCP).
The benchmark's methodology allows developers to test their own AI agents by replacing system prompts and API endpoints in the test scripts. The findings arrive at a critical time as AI coding assistants and autonomous agents gain deeper integration with development environments and production systems. The release includes containerized test infrastructure, documented attack scenarios, and published results to help the AI safety community better understand and mitigate these emerging security risks.
- The open-source benchmark provides reusable testing infrastructure that allows developers to evaluate their own AI agents against these attack vectors
- The research highlights urgent security concerns as AI agents gain increasingly deep integration with development environments and production systems
Editorial Opinion
This research arrives at a pivotal moment when AI coding assistants are rapidly moving from experimental tools to production-critical infrastructure. The 78% success rate (7 of 9 attacks) is alarming and suggests the industry has prioritized capability over security in the race to ship agent-based products. What makes ClawSandbox particularly valuable is its framework-agnostic approach—by demonstrating that these vulnerabilities are architectural rather than implementation-specific, it forces a broader reckoning with AI agent security across the entire ecosystem. The open-source release of testing infrastructure is commendable and should accelerate community-driven solutions to these fundamental safety challenges.


