Claude Mythos Preview and GPT-5.5 Break Autonomous Cybersecurity Benchmarks; AI Cyber Capability Doubling Every Few Months

Key Takeaways

▸Frontier AI models are advancing at unprecedented speed: autonomous cyber capability is doubling every 4-5 months, compared to 8-month intervals measured earlier this year
▸Claude Mythos Preview made history as the first AI model to complete both of AISI's multi-stage cyber range challenges, including the previously-unsolved 'Cooling Tower' simulation
▸Palo Alto Networks identified 26 CVEs (75 total issues) through AI scanning in one month—15x the typical volume—signaling massive acceleration in vulnerability discovery

Source:

Hacker Newshttps://cyberscoop.com/ai-autonomous-cyber-capability-benchmarks-broken-gpt5-claude-mythos/↗

Summary

Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have dramatically exceeded benchmark expectations for autonomous cybersecurity tasks, according to research published Wednesday by the United Kingdom's AI Security Institute (AISI) and Palo Alto Networks. The frontier AI models have shattered trend lines tracked by AISI since late 2024, with autonomous cyber task completion now doubling approximately every four to five months—far faster than the eight-month doubling rate estimated just six months earlier.

In breakthrough performance, Claude Mythos Preview became the first model to successfully complete both of AISI's cyber range challenges, solving "The Last Ones" (a 32-step corporate network attack simulation) in 6 of 10 attempts and completing "Cooling Tower"—previously unsolved by any model—in 3 of 10 attempts. GPT-5.5 solved "The Last Ones" in 3 of 10 attempts. Palo Alto Networks, testing Claude Mythos through Anthropic's Project Glasswing initiative, reported that the latest models can identify vulnerabilities and convert them into critical exploits in near-real-time, leading the security firm to release 26 CVEs addressing 75 issues—roughly 15 times the typical monthly volume.

While AISI cautioned that results are based on limited model data, the institute found the overall trend robust across multiple methodological approaches. Independent research from METR corroborated the findings, confirming the four-month doubling timeline. The rapid advancement raises urgent questions about enterprise vulnerability management and the defensive-offensive imbalance as these autonomous cyber capabilities proliferate beyond research environments.

Both Anthropic and OpenAI models substantially exceeded AISI's trend line predictions, suggesting a potential inflection point in AI-driven cybersecurity capability
The rapid advancement presents dual-use concerns: while enabling faster defense, it equally empowers more sophisticated autonomous attacks

Editorial Opinion

The accelerating pace of AI-driven cybersecurity capabilities presents a critical dual-use dilemma. While Claude Mythos Preview and GPT-5.5 promise faster vulnerability discovery and remediation—evident in Palo Alto's CVE output—the same capabilities enable more sophisticated automated attacks. The compression of development timelines from years to mere months suggests the security industry may struggle to maintain defensive parity as these tools proliferate. Enterprises and policymakers must move urgently to close the gap between AI-assisted defense and AI-enabled offense, or risk a profound security asymmetry.

Claude Mythos Preview and GPT-5.5 Break Autonomous Cybersecurity Benchmarks; AI Cyber Capability Doubling Every Few Months

Key Takeaways

▸Frontier AI models are advancing at unprecedented speed: autonomous cyber capability is doubling every 4-5 months, compared to 8-month intervals measured earlier this year
▸Claude Mythos Preview made history as the first AI model to complete both of AISI's multi-stage cyber range challenges, including the previously-unsolved 'Cooling Tower' simulation
▸Palo Alto Networks identified 26 CVEs (75 total issues) through AI scanning in one month—15x the typical volume—signaling massive acceleration in vulnerability discovery

Summary

Both Anthropic and OpenAI models substantially exceeded AISI's trend line predictions, suggesting a potential inflection point in AI-driven cybersecurity capability
The rapid advancement presents dual-use concerns: while enabling faster defense, it equally empowers more sophisticated autonomous attacks

Editorial Opinion

The accelerating pace of AI-driven cybersecurity capabilities presents a critical dual-use dilemma. While Claude Mythos Preview and GPT-5.5 promise faster vulnerability discovery and remediation—evident in Palo Alto's CVE output—the same capabilities enable more sophisticated automated attacks. The compression of development timelines from years to mere months suggests the security industry may struggle to maintain defensive parity as these tools proliferate. Enterprises and policymakers must move urgently to close the gap between AI-assisted defense and AI-enabled offense, or risk a profound security asymmetry.

Claude Mythos Preview and GPT-5.5 Break Autonomous Cybersecurity Benchmarks; AI Cyber Capability Doubling Every Few Months

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Launches Economic Futures Program to Map AI's Impact on the Global Economy

Anthropic Splits Claude Subscriptions: Programmatic Usage Moves to Separate Credit Pool

Anthropic Implements New Agent SDK Credit System on June 15, Separating AI Agent Usage from Interactive Plan Limits

Comments

Suggested

Google Brings On-Device AI Contextual Suggestions to Android, Learning from Your Habits

Investigative Report: Hidden Workers Revealed as the Secret Force Behind ChatGPT

Geometry Conflict: New Research Reveals Why LLMs Forget During Continual Training

Claude Mythos Preview and GPT-5.5 Break Autonomous Cybersecurity Benchmarks; AI Cyber Capability Doubling Every Few Months

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Launches Economic Futures Program to Map AI's Impact on the Global Economy

Anthropic Splits Claude Subscriptions: Programmatic Usage Moves to Separate Credit Pool

Anthropic Implements New Agent SDK Credit System on June 15, Separating AI Agent Usage from Interactive Plan Limits

Comments

Suggested

Google Brings On-Device AI Contextual Suggestions to Android, Learning from Your Habits

Investigative Report: Hidden Workers Revealed as the Secret Force Behind ChatGPT

Geometry Conflict: New Research Reveals Why LLMs Forget During Continual Training