Report: 87% of AI-Generated Pull Requests Ship Security Vulnerabilities—No Agent Produced Fully Secure Code
Key Takeaways
- ▸All three AI coding agents introduced the same four vulnerability classes regardless of model or application, suggesting systemic gaps in how LLMs approach security
- ▸AI agents lack threat modeling and security context—they implement features as described without considering attacker perspectives or inconsistent security coverage
- ▸Partial security implementation is more dangerous than none: agents added JWT tokens and middleware but applied them inconsistently, creating false illusions of protection
Summary
DryRun Security released its Agentic Coding Security Report, revealing that 87% of AI-generated pull requests introduced at least one security vulnerability across three major coding agents. In their test, Claude Code (Anthropic), OpenAI Codex, and Google Gemini were tasked with building two complete applications—a family allergy tracker and a multiplayer racing game—from scratch using sequential pull requests. Across 30 total PRs and 38 scans, 143 vulnerabilities were identified, and zero applications achieved full security. The analysis exposed critical patterns: all three agents implemented identical vulnerability classes, including broken access control (unauthenticated endpoints for destructive operations), business logic failures (client-side validation of sensitive state), and inconsistent security application (authentication middleware applied to REST but not WebSocket endpoints). Claude Code produced the highest severity issues, including a critical 2FA-disable bypass, while OpenAI's Codex performed best with fewer final vulnerabilities but still failed to eliminate the core vulnerability classes. Google Gemini fell between the two in performance.
- Claude Code demonstrated the weakest security posture with the most high-severity unresolved vulnerabilities, while Codex showed marginally better remediation behavior in iterative development
- No commercial AI coding agent currently produces production-ready secure code without human security review
Editorial Opinion
This report is a sobering wake-up call for organizations relying on AI agents for production code. While these agents excel at rapid feature development, their fundamental inability to reason about security context—to think like an attacker—makes them unsuitable for security-sensitive applications without rigorous human review. The consistency of failures across different vendors suggests this is not a model-specific problem but an architectural limitation of current LLM-based code generation. The most alarming finding is the partial implementation pattern: developers reviewing AI-generated PRs may see authentication logic and assume security coverage exists, creating organizational risk through false confidence. Until AI agents develop adversarial reasoning capabilities or are paired with mandatory security tooling, treating all AI-generated code as inherently insecure should be standard practice.


