Claude Mythos Preview and GPT-5.5 Break Autonomous Cybersecurity Benchmarks; AI Cyber Capability Doubling Every Few Months
Key Takeaways
- ▸Frontier AI models are advancing at unprecedented speed: autonomous cyber capability is doubling every 4-5 months, compared to 8-month intervals measured earlier this year
- ▸Claude Mythos Preview made history as the first AI model to complete both of AISI's multi-stage cyber range challenges, including the previously-unsolved 'Cooling Tower' simulation
- ▸Palo Alto Networks identified 26 CVEs (75 total issues) through AI scanning in one month—15x the typical volume—signaling massive acceleration in vulnerability discovery
Summary
Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have dramatically exceeded benchmark expectations for autonomous cybersecurity tasks, according to research published Wednesday by the United Kingdom's AI Security Institute (AISI) and Palo Alto Networks. The frontier AI models have shattered trend lines tracked by AISI since late 2024, with autonomous cyber task completion now doubling approximately every four to five months—far faster than the eight-month doubling rate estimated just six months earlier.
In breakthrough performance, Claude Mythos Preview became the first model to successfully complete both of AISI's cyber range challenges, solving "The Last Ones" (a 32-step corporate network attack simulation) in 6 of 10 attempts and completing "Cooling Tower"—previously unsolved by any model—in 3 of 10 attempts. GPT-5.5 solved "The Last Ones" in 3 of 10 attempts. Palo Alto Networks, testing Claude Mythos through Anthropic's Project Glasswing initiative, reported that the latest models can identify vulnerabilities and convert them into critical exploits in near-real-time, leading the security firm to release 26 CVEs addressing 75 issues—roughly 15 times the typical monthly volume.
While AISI cautioned that results are based on limited model data, the institute found the overall trend robust across multiple methodological approaches. Independent research from METR corroborated the findings, confirming the four-month doubling timeline. The rapid advancement raises urgent questions about enterprise vulnerability management and the defensive-offensive imbalance as these autonomous cyber capabilities proliferate beyond research environments.
- Both Anthropic and OpenAI models substantially exceeded AISI's trend line predictions, suggesting a potential inflection point in AI-driven cybersecurity capability
- The rapid advancement presents dual-use concerns: while enabling faster defense, it equally empowers more sophisticated autonomous attacks
Editorial Opinion
The accelerating pace of AI-driven cybersecurity capabilities presents a critical dual-use dilemma. While Claude Mythos Preview and GPT-5.5 promise faster vulnerability discovery and remediation—evident in Palo Alto's CVE output—the same capabilities enable more sophisticated automated attacks. The compression of development timelines from years to mere months suggests the security industry may struggle to maintain defensive parity as these tools proliferate. Enterprises and policymakers must move urgently to close the gap between AI-assisted defense and AI-enabled offense, or risk a profound security asymmetry.



