ExploitGym: Frontier AI Models Successfully Exploit Real-World Vulnerabilities

Key Takeaways

▸ExploitGym benchmark demonstrates that frontier AI models can successfully exploit real-world vulnerabilities across diverse domains including userspace programs, JavaScript engines, and Linux kernel
▸Anthropic's Claude Mythos Preview achieved the strongest performance with 157 successful exploits, followed by OpenAI's GPT-5.5 with 120
▸AI models maintain meaningful exploitation capabilities even with common security protections (ASLR, DEP) enabled, raising systemic cybersecurity concerns

Source:

Hacker Newshttps://arxiv.org/abs/2605.11086↗

Summary

Researchers have introduced ExploitGym, a large-scale benchmark designed to evaluate AI agents' ability to turn security vulnerabilities into working exploits. The benchmark comprises 898 real-world vulnerability instances across three domains: userspace programs, Google's V8 JavaScript engine, and the Linux kernel. According to the study, frontier AI models demonstrate non-trivial exploitation capabilities, with Anthropic's Claude Mythos Preview achieving the strongest performance by successfully exploiting 157 vulnerabilities, followed by OpenAI's GPT-5.5 with 120 successful exploits.

The research reveals significant cybersecurity implications, as even with widely deployed security protections (like ASLR and DEP) enabled, AI models retain meaningful exploitation success rates. This finding highlights the growing security risks posed by increasingly capable AI agents, as they can combine low-level program reasoning, runtime adaptation, and sustained progress to transform theoretical vulnerabilities into practical attacks. The study establishes ExploitGym as an effective testbed for evaluating AI exploitation capabilities and underscores the urgency of developing robust defenses against AI-powered attacks.

The research highlights the dual-use nature of AI exploitation capabilities and underscores the urgent need for AI safety governance and cybersecurity defenses

Editorial Opinion

ExploitGym represents a critical step in responsibly evaluating AI capabilities for a dual-use application that could reshape cybersecurity. The finding that frontier models can exploit real vulnerabilities at scale—even with defenses enabled—should trigger serious discussions about AI safety and security governance. While the benchmark serves defensive purposes, it also demonstrates how capable AI agents could lower the barrier for adversarial exploitation, making this research simultaneously valuable for security teams and concerning for the broader industry.

ExploitGym: Frontier AI Models Successfully Exploit Real-World Vulnerabilities

Key Takeaways

▸ExploitGym benchmark demonstrates that frontier AI models can successfully exploit real-world vulnerabilities across diverse domains including userspace programs, JavaScript engines, and Linux kernel
▸Anthropic's Claude Mythos Preview achieved the strongest performance with 157 successful exploits, followed by OpenAI's GPT-5.5 with 120
▸AI models maintain meaningful exploitation capabilities even with common security protections (ASLR, DEP) enabled, raising systemic cybersecurity concerns

Summary

The research highlights the dual-use nature of AI exploitation capabilities and underscores the urgent need for AI safety governance and cybersecurity defenses

Editorial Opinion

ExploitGym represents a critical step in responsibly evaluating AI capabilities for a dual-use application that could reshape cybersecurity. The finding that frontier models can exploit real vulnerabilities at scale—even with defenses enabled—should trigger serious discussions about AI safety and security governance. While the benchmark serves defensive purposes, it also demonstrates how capable AI agents could lower the barrier for adversarial exploitation, making this research simultaneously valuable for security teams and concerning for the broader industry.

ExploitGym: Frontier AI Models Successfully Exploit Real-World Vulnerabilities

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

We Can Now Read What Claude Is Thinking. Kind Of

Governments' Control of Media Shapes Large Language Model Outputs, New Research Shows

Anthropic Investigating Unauthorized Access to Claude Mythos Cybersecurity Tool

Comments

Suggested

We Can Now Read What Claude Is Thinking. Kind Of

Adobe Faces Federal Lawsuit Over Unauthorized AI Voice Training

OpenAI Faces Lawsuit Over ChatGPT Advice in Fatal Overdose Case

ExploitGym: Frontier AI Models Successfully Exploit Real-World Vulnerabilities

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

We Can Now Read What Claude Is Thinking. Kind Of

Governments' Control of Media Shapes Large Language Model Outputs, New Research Shows

Anthropic Investigating Unauthorized Access to Claude Mythos Cybersecurity Tool

Comments

Suggested

We Can Now Read What Claude Is Thinking. Kind Of

Adobe Faces Federal Lawsuit Over Unauthorized AI Voice Training

OpenAI Faces Lawsuit Over ChatGPT Advice in Fatal Overdose Case