New Open-Source Benchmark Reveals 87% of AI Agent Tool-Use Attacks Succeed by Default; MCPGuard Proxy Reduces to ~10%
Key Takeaways
- ▸Default AI agent configurations are highly vulnerable, with 87% of targeted attacks succeeding, driven primarily by over-permissive tool access and weak refusal mechanisms
- ▸Configuration hardening (denying dangerous tools like exec and web access) provides meaningful but incomplete protection, reducing attack success to 37%, indicating that configuration alone is insufficient
- ▸MCPGuard's chain-detection and output-scanning layers (L4) are the load-bearing components of defense, reducing residual attack success to ~10% and providing an 88-95% relative improvement over baseline
Summary
A new open-source security benchmark has exposed significant vulnerabilities in AI agents with tool access, finding that 87% of 30 carefully crafted attack prompts successfully compromised a default-configured agent. The research, published with full reproducibility artifacts, demonstrates a three-phase defense strategy: hardening tool configurations reduced attack success to 37%, while adding MCPGuard—a lightweight five-layer Python proxy—brought the rate down to approximately 10% (95% CI [4%-20%]). The study tested multiple models (Qwen 3.5-Small and Claude Sonnet 4.6) across six attack categories including exfiltration, prompt injection, privilege escalation, and social engineering. All code, attack prompts, test runner, and raw results are publicly available, enabling other developers to evaluate and improve AI agent security.
- Vulnerability patterns (the 'Phrase Gap') persist across models, including frontier models like Claude, suggesting that phrasing-aware defenses are necessary complements to configuration hardening
Editorial Opinion
This research fills a critical gap in AI safety by providing a reproducible, open-source framework for measuring and defending against real-world agent compromise attacks. The clear demonstration that no single defense layer suffices—and that proxy-based monitoring at the LLM-tool boundary is essential—should shift the security conversation from configuration alone to layered defense. The fact that attack success rates remain non-trivial even with hardening and proxy defense underscores the ongoing challenge of building trustworthy tool-using AI agents.



