Anthropic's Claude Code vs. OpenAI's Codex: Security Defaults Reveal Different AI Coding Philosophies
Key Takeaways
- ▸Claude Code favors explicit, well-known security libraries (bcrypt), while Codex relies more on standard library implementations and custom cryptographic code, reflecting different trust models in AI-assisted development
- ▸Neither model volunteered rate limiting or brute-force protection by default, despite these being common security requirements, indicating a gap in AI safety defaults for authentication systems
- ▸Framework choice had a larger impact on security compliance than model choice, with FastAPI providing better middleware-based protections (96% vs. 73% on Next.js), suggesting developers should pair AI coding tools with frameworks that enforce secure defaults
Summary
A new security benchmark comparing Anthropic's Claude Code and OpenAI's Codex reveals fundamental differences in how these AI code generation tools approach security by default. Researchers at Anthropic tested both models against six common development tasks—authentication, file uploads, search, admin controls, webhooks, and production configuration—using deliberately security-silent prompts to measure what security decisions the AI would make unprompted. The results showed Claude Code consistently chose established security libraries like bcrypt for password hashing across all six sessions, while Codex opted for standard library implementations (PBKDF2, scrypt) and even built JWT encoding from scratch in two sessions using raw HMAC.
Both models exhibited the same critical omissions: neither volunteered rate limiting on login endpoints or security headers, leaving brute-force protection and reconnaissance vectors unaddressed. The benchmark ran 33 exploit tests across 12 total sessions spanning FastAPI and Next.js frameworks, with frameworks themselves accounting for the largest performance gap (FastAPI at 96% compliance vs. Next.js at 73%). Beyond the scorecards, the research highlights that many application security vulnerabilities stem not from exotic attacks but from mundane, unrequested decisions—which hash function to use, whether production still serves API documentation, and whether login endpoints ever throttle. Codex shipped Swagger UI in production and exposed /openapi.json in all sessions, creating reconnaissance opportunities that static analysis missed but dynamic testing caught.
- API documentation exposure (/openapi.json, Swagger UI in production) was consistently overlooked by both models, demonstrating that reconnaissance vectors matter as much as injection vulnerabilities in real-world security posture
Editorial Opinion
This benchmark is a crucial reality check for AI-assisted development in security-critical contexts. While both Claude and Codex can identify zero-day vulnerabilities in existing code, their approach to writing new code reveals that AI models still lack consistent security intuition for mundane but essential hardening decisions. The research sensibly frames this not as a scorecard but as a lens on the quiet decisions that accumulate into risk—a framing the industry should adopt more broadly. Organizations deploying these tools need to treat AI-generated code as a starting point requiring human security review, not a finished product, and should pair them with frameworks that provide security by default.

