Researcher Uses LLMs to Discover Dozens of Vulnerabilities in Major Open-Source Projects

Key Takeaways

▸LLMs can successfully identify serious vulnerabilities in popular open-source projects without manual code review, proving the viability of AI-powered security research
▸Over-scaffolding and bloated context windows paradoxically harm vulnerability detection; the "needle-in-the-haystack" problem causes models to miss critical details buried in lengthy code
▸Optimal LLM-based security auditing requires minimal persistent scaffolding paired with targeted exploration, avoiding both excessive orchestration and completely unguided analysis

Source:

Hacker Newshttps://devansh.bearblog.dev/needle-in-the-haystack/↗

Summary

A security researcher has published findings on using large language models (LLMs) like Claude Opus to identify vulnerabilities in well-known open-source projects, discovering over a dozen CVEs entirely through AI-powered analysis without manual code review. The research demonstrates that agentic LLMs can effectively uncover obscure security issues in major projects including Parse Server, HonoJS, ElysiaJS, and Harden Runner. However, the researcher challenges conventional wisdom about AI-assisted security auditing, finding that excessive context and over-scaffolding actually degrades vulnerability detection performance due to "context rot"—where model reliability deteriorates as token count increases. Instead, the most effective approach involves minimal persistent scaffolding combined with maximal targeted exploration, keeping the model's focus anchored to critical details.

The research reveals primacy/recency biases in LLMs where models perform better when relevant information appears near context boundaries rather than in the middle

Editorial Opinion

This research challenges assumptions about how to effectively deploy LLMs for security work and suggests that simpler, more focused prompting strategies may outperform elaborate orchestration frameworks. The finding that context rot degrades vulnerability detection is particularly important for the security community, as it implies that traditional approaches to comprehensive code analysis may need rethinking when using AI tools. If these results hold at scale, they could reshape how organizations approach AI-assisted security auditing.

Anthropic

RESEARCH Anthropic2026-03-10

Researcher Uses LLMs to Discover Dozens of Vulnerabilities in Major Open-Source Projects

Key Takeaways

▸LLMs can successfully identify serious vulnerabilities in popular open-source projects without manual code review, proving the viability of AI-powered security research
▸Over-scaffolding and bloated context windows paradoxically harm vulnerability detection; the "needle-in-the-haystack" problem causes models to miss critical details buried in lengthy code
▸Optimal LLM-based security auditing requires minimal persistent scaffolding paired with targeted exploration, avoiding both excessive orchestration and completely unguided analysis

Source:

Hacker Newshttps://devansh.bearblog.dev/needle-in-the-haystack/↗

Summary

The research reveals primacy/recency biases in LLMs where models perform better when relevant information appears near context boundaries rather than in the middle

Editorial Opinion

This research challenges assumptions about how to effectively deploy LLMs for security work and suggests that simpler, more focused prompting strategies may outperform elaborate orchestration frameworks. The finding that context rot degrades vulnerability detection is particularly important for the security community, as it implies that traditional approaches to comprehensive code analysis may need rethinking when using AI tools. If these results hold at scale, they could reshape how organizations approach AI-assisted security auditing.

Researcher Uses LLMs to Discover Dozens of Vulnerabilities in Major Open-Source Projects

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Researcher Uses LLMs to Discover Dozens of Vulnerabilities in Major Open-Source Projects

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains