Report: 87% of AI-Generated Pull Requests Ship Security Vulnerabilities—No Agent Produced Fully Secure Code

Key Takeaways

▸All three AI coding agents introduced the same four vulnerability classes regardless of model or application, suggesting systemic gaps in how LLMs approach security
▸AI agents lack threat modeling and security context—they implement features as described without considering attacker perspectives or inconsistent security coverage
▸Partial security implementation is more dangerous than none: agents added JWT tokens and middleware but applied them inconsistently, creating false illusions of protection

Source:

Hacker Newshttps://grith.ai/blog/87-percent-of-ai-pull-requests-ship-vulnerabilities↗

Summary

DryRun Security released its Agentic Coding Security Report, revealing that 87% of AI-generated pull requests introduced at least one security vulnerability across three major coding agents. In their test, Claude Code (Anthropic), OpenAI Codex, and Google Gemini were tasked with building two complete applications—a family allergy tracker and a multiplayer racing game—from scratch using sequential pull requests. Across 30 total PRs and 38 scans, 143 vulnerabilities were identified, and zero applications achieved full security. The analysis exposed critical patterns: all three agents implemented identical vulnerability classes, including broken access control (unauthenticated endpoints for destructive operations), business logic failures (client-side validation of sensitive state), and inconsistent security application (authentication middleware applied to REST but not WebSocket endpoints). Claude Code produced the highest severity issues, including a critical 2FA-disable bypass, while OpenAI's Codex performed best with fewer final vulnerabilities but still failed to eliminate the core vulnerability classes. Google Gemini fell between the two in performance.

Claude Code demonstrated the weakest security posture with the most high-severity unresolved vulnerabilities, while Codex showed marginally better remediation behavior in iterative development
No commercial AI coding agent currently produces production-ready secure code without human security review

Editorial Opinion

This report is a sobering wake-up call for organizations relying on AI agents for production code. While these agents excel at rapid feature development, their fundamental inability to reason about security context—to think like an attacker—makes them unsuitable for security-sensitive applications without rigorous human review. The consistency of failures across different vendors suggests this is not a model-specific problem but an architectural limitation of current LLM-based code generation. The most alarming finding is the partial implementation pattern: developers reviewing AI-generated PRs may see authentication logic and assume security coverage exists, creating organizational risk through false confidence. Until AI agents develop adversarial reasoning capabilities or are paired with mandatory security tooling, treating all AI-generated code as inherently insecure should be standard practice.

Report: 87% of AI-Generated Pull Requests Ship Security Vulnerabilities—No Agent Produced Fully Secure Code

Key Takeaways

▸All three AI coding agents introduced the same four vulnerability classes regardless of model or application, suggesting systemic gaps in how LLMs approach security
▸AI agents lack threat modeling and security context—they implement features as described without considering attacker perspectives or inconsistent security coverage
▸Partial security implementation is more dangerous than none: agents added JWT tokens and middleware but applied them inconsistently, creating false illusions of protection

Summary

Claude Code demonstrated the weakest security posture with the most high-severity unresolved vulnerabilities, while Codex showed marginally better remediation behavior in iterative development
No commercial AI coding agent currently produces production-ready secure code without human security review

Editorial Opinion

This report is a sobering wake-up call for organizations relying on AI agents for production code. While these agents excel at rapid feature development, their fundamental inability to reason about security context—to think like an attacker—makes them unsuitable for security-sensitive applications without rigorous human review. The consistency of failures across different vendors suggests this is not a model-specific problem but an architectural limitation of current LLM-based code generation. The most alarming finding is the partial implementation pattern: developers reviewing AI-generated PRs may see authentication logic and assume security coverage exists, creating organizational risk through false confidence. Until AI agents develop adversarial reasoning capabilities or are paired with mandatory security tooling, treating all AI-generated code as inherently insecure should be standard practice.

Report: 87% of AI-Generated Pull Requests Ship Security Vulnerabilities—No Agent Produced Fully Secure Code

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Report: 87% of AI-Generated Pull Requests Ship Security Vulnerabilities—No Agent Produced Fully Secure Code

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains