numasec: Open-Source AI Pentester Discovers 22 Vulnerabilities in Quiz App, Achieves 96% Detection Rate on OWASP Juice Shop

Key Takeaways

▸numasec's multi-agent architecture coordinates 10 specialized agents with role-based permissions to conduct full penetration tests, moving beyond single-scanner tools to orchestrate 21 security tools through formal PTES methodology
▸The tool achieved 96% vulnerability recall on OWASP Juice Shop and discovered 22 real vulnerabilities in a live application, including complex attack chains (e.g., leaked API keys → SSRF → cloud metadata → account takeover)
▸Built-in knowledge base of 34 templates and strict evidence criteria prevent hallucination and false positives while automatically enriching findings with CWE IDs, CVSS 3.1 scores, and actionable remediation guidance

Source:

Hacker Newshttps://github.com/FrancescoStabile/numasec↗

Summary

Francesco Costa has released numasec, an open-source AI-powered penetration testing tool that autonomously identifies security vulnerabilities with a 96% recall rate on industry-standard benchmarks. The tool orchestrates 10 specialized agents running 21 offensive security tools through formal PTES (Penetration Testing Execution Standard) methodology, moving beyond simple vulnerability scanning to chain attacks together and produce professional reports with CVSS scores and remediation guidance.

In a real-world demonstration, Costa ran numasec against a vibe-coded quiz application and uncovered 22 distinct vulnerabilities spanning SQL injection, authentication bypasses, SSRF attacks, XSS flaws, and more. The tool's multi-agent architecture assigns distinct roles and permissions to different agents—reconnaissance, vulnerability hunting, secure code review, and reporting—ensuring specialized focus and reducing false positives through dedicated analyst validation.

Numasec supports multiple LLM backends including Claude, GPT-4, Gemini, DeepSeek, and Mistral, making it model-agnostic. The platform includes a built-in knowledge base of 34 templates to prevent AI hallucination and provide proven exploitation techniques, while outputting findings in SARIF, Markdown, HTML, and JSON formats with full OWASP Top 10 mapping and CWE attribution.

Model-agnostic design supports Claude, GPT-4, Gemini, DeepSeek, Mistral, and any OpenAI-compatible API, with easy installation via pip, Docker, or direct download

Editorial Opinion

numasec represents a meaningful evolution beyond marketing-driven 'AI security tools' by implementing genuine multi-agent coordination with specialized roles and formal penetration testing methodology. The 96% benchmark performance and real-world 22-vulnerability discovery suggest real technical depth, though the reliance on multiple LLM providers introduces consistency and reproducibility questions that future evaluations should address. Open-source release with model-agnostic architecture is a strategic strength that could accelerate adoption across enterprises with varying LLM preferences.

numasec: Open-Source AI Pentester Discovers 22 Vulnerabilities in Quiz App, Achieves 96% Detection Rate on OWASP Juice Shop

Key Takeaways

▸numasec's multi-agent architecture coordinates 10 specialized agents with role-based permissions to conduct full penetration tests, moving beyond single-scanner tools to orchestrate 21 security tools through formal PTES methodology
▸The tool achieved 96% vulnerability recall on OWASP Juice Shop and discovered 22 real vulnerabilities in a live application, including complex attack chains (e.g., leaked API keys → SSRF → cloud metadata → account takeover)
▸Built-in knowledge base of 34 templates and strict evidence criteria prevent hallucination and false positives while automatically enriching findings with CWE IDs, CVSS 3.1 scores, and actionable remediation guidance

Summary

Model-agnostic design supports Claude, GPT-4, Gemini, DeepSeek, Mistral, and any OpenAI-compatible API, with easy installation via pip, Docker, or direct download

Editorial Opinion

numasec represents a meaningful evolution beyond marketing-driven 'AI security tools' by implementing genuine multi-agent coordination with specialized roles and formal penetration testing methodology. The 96% benchmark performance and real-world 22-vulnerability discovery suggest real technical depth, though the reliance on multiple LLM providers introduces consistency and reproducibility questions that future evaluations should address. Open-source release with model-agnostic architecture is a strategic strength that could accelerate adoption across enterprises with varying LLM preferences.

numasec: Open-Source AI Pentester Discovers 22 Vulnerabilities in Quiz App, Achieves 96% Detection Rate on OWASP Juice Shop

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

AMD's Ryzen AI Halo Makes Local AI Development Accessible, But at a Premium Price

DeepSeek V4 Doubles Market Share, Dominates Agentic Workloads

XGBoost Outperforms LLMs at Detecting Civilian Harm in Ukraine War Social Media

numasec: Open-Source AI Pentester Discovers 22 Vulnerabilities in Quiz App, Achieves 96% Detection Rate on OWASP Juice Shop

Key Takeaways

Summary

Editorial Opinion

Comments

Suggested

AMD's Ryzen AI Halo Makes Local AI Development Accessible, But at a Premium Price

DeepSeek V4 Doubles Market Share, Dominates Agentic Workloads

XGBoost Outperforms LLMs at Detecting Civilian Harm in Ukraine War Social Media