Anthropic's Mythos Model Falls Short of Hype in Real-World Security Testing
Key Takeaways
- ▸Mythos found only 1 confirmed security vulnerability in cURL versus 5 claimed, with Stenberg calling the results underwhelming compared to the model's marketing positioning
- ▸cURL creator directly criticized Mythos as primarily a "marketing" effort, with no evidence it performs better than established AI security tools already in widespread use
- ▸Other AI tools have been significantly more effective on cURL, discovering 200–300 bugfixes and ~12 confirmed CVEs in recent months—far exceeding Mythos's single finding
Summary
cURL creator Daniel Stenberg has published critical findings about Anthropic's Mythos model, revealing significant gaps between the company's marketing claims and the AI's actual security vulnerability detection performance. Through Anthropic's Project Glasswing program, Mythos was run against cURL's codebase but identified only one confirmed low-severity vulnerability out of five claimed security issues—with three proven false positives and one deemed a trivial bug. Stenberg directly challenged the "big hype" surrounding Mythos, stating his conclusion that the promotion is "primarily marketing" rather than a genuine breakthrough, and found no evidence the model outperforms existing tools like AISLE and Codex. His experience is particularly significant given that cURL has been extensively tested by other AI security tools over the past 10 months, which have collectively identified 2–3 hundred bugfixes and approximately a dozen confirmed CVEs, substantially exceeding Mythos's limited findings.
- Access limitations through Project Glasswing prevented direct testing, raising questions about transparency and real-world effectiveness of the model
Editorial Opinion
Mythos exemplifies a troubling pattern in AI announcements: aspirational marketing that far outpaces demonstrated capability. While Stenberg's honest assessment is valuable for setting realistic expectations about AI-powered security, it underscores the gap between vendor claims and independent validation. For security teams evaluating Mythos, the lesson is clear—proven tools with established track records may deliver more tangible value than newer models riding hype.

