numasec: Open-Source AI Pentester Discovers 22 Vulnerabilities in Quiz App, Achieves 96% Detection Rate on OWASP Juice Shop
Key Takeaways
- ▸numasec's multi-agent architecture coordinates 10 specialized agents with role-based permissions to conduct full penetration tests, moving beyond single-scanner tools to orchestrate 21 security tools through formal PTES methodology
- ▸The tool achieved 96% vulnerability recall on OWASP Juice Shop and discovered 22 real vulnerabilities in a live application, including complex attack chains (e.g., leaked API keys → SSRF → cloud metadata → account takeover)
- ▸Built-in knowledge base of 34 templates and strict evidence criteria prevent hallucination and false positives while automatically enriching findings with CWE IDs, CVSS 3.1 scores, and actionable remediation guidance
Summary
Francesco Costa has released numasec, an open-source AI-powered penetration testing tool that autonomously identifies security vulnerabilities with a 96% recall rate on industry-standard benchmarks. The tool orchestrates 10 specialized agents running 21 offensive security tools through formal PTES (Penetration Testing Execution Standard) methodology, moving beyond simple vulnerability scanning to chain attacks together and produce professional reports with CVSS scores and remediation guidance.
In a real-world demonstration, Costa ran numasec against a vibe-coded quiz application and uncovered 22 distinct vulnerabilities spanning SQL injection, authentication bypasses, SSRF attacks, XSS flaws, and more. The tool's multi-agent architecture assigns distinct roles and permissions to different agents—reconnaissance, vulnerability hunting, secure code review, and reporting—ensuring specialized focus and reducing false positives through dedicated analyst validation.
Numasec supports multiple LLM backends including Claude, GPT-4, Gemini, DeepSeek, and Mistral, making it model-agnostic. The platform includes a built-in knowledge base of 34 templates to prevent AI hallucination and provide proven exploitation techniques, while outputting findings in SARIF, Markdown, HTML, and JSON formats with full OWASP Top 10 mapping and CWE attribution.
- Model-agnostic design supports Claude, GPT-4, Gemini, DeepSeek, Mistral, and any OpenAI-compatible API, with easy installation via pip, Docker, or direct download
Editorial Opinion
numasec represents a meaningful evolution beyond marketing-driven 'AI security tools' by implementing genuine multi-agent coordination with specialized roles and formal penetration testing methodology. The 96% benchmark performance and real-world 22-vulnerability discovery suggest real technical depth, though the reliance on multiple LLM providers introduces consistency and reproducibility questions that future evaluations should address. Open-source release with model-agnostic architecture is a strategic strength that could accelerate adoption across enterprises with varying LLM preferences.



