Research Shows AI Cybersecurity Capability Is 'Jagged': Smaller Open Models Match Mythos on Key Vulnerability Discovery Tasks
Key Takeaways
- ▸Small open-weight models (3.6B-5.1B parameters) matched Mythos's performance on specific showcase vulnerabilities, suggesting model size alone is not determinative of cybersecurity capability
- ▸AI cybersecurity capability does not scale smoothly with model parameters—performance is 'jagged' and task-dependent, with no single best model across different vulnerability types
- ▸The competitive advantage in AI cybersecurity comes from integrated system design, maintainer relationships, and operational pipelines that achieve accepted patches, not from frontier model capabilities alone
Summary
Following Anthropic's April 2026 announcement of Claude Mythos and Project Glasswing—a consortium aimed at using AI to discover zero-day vulnerabilities in critical software—independent research from AISLE demonstrates that the cybersecurity advantage is not purely a function of model size or sophistication. When tested on Anthropic's showcase vulnerabilities, smaller open-weight models (as small as 3.6 billion parameters costing $0.11 per million tokens) recovered much of the same vulnerability analysis as Mythos, including detection of an autonomous FreeBSD exploit and analysis of a 27-year-old OpenBSD bug. AISLE has independently validated over 180 CVEs across 30+ projects since mid-2025, including 12 out of 12 vulnerabilities in a single OpenSSL security release. The research indicates that AI cybersecurity capability does not scale smoothly with model size—the capability frontier is highly "jagged," with different models excelling at different tasks. Rather than a single superior model, the real moat in AI cybersecurity lies in the specialized system architecture, deep security expertise, and operational infrastructure built around vulnerability discovery, validation, and remediation pipelines.
- Independent validation from operational systems (AISLE's 180+ discovered CVEs) suggests the market may be bifurcating between specialized cybersecurity systems and general-purpose AI models
Editorial Opinion
Anthropic's Mythos announcement created important momentum for AI-assisted vulnerability discovery, but this research suggests the narrative of 'frontier models solving cybersecurity' may oversimplify how capability actually distributes across model scales and specializations. The finding that 3.6B-parameter models can match Mythos on specific tasks is striking and challenges assumptions about scale. However, neither the Mythos announcement nor this critique fully addresses the operational end-to-end pipeline question: who can actually move from discovery to trust-based remediation with maintainers at scale? That may be where real differentiation lies.



