BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-03-10

Researcher Uses LLMs to Discover Dozens of Vulnerabilities in Major Open-Source Projects

Key Takeaways

  • ▸LLMs can successfully identify serious vulnerabilities in popular open-source projects without manual code review, proving the viability of AI-powered security research
  • ▸Over-scaffolding and bloated context windows paradoxically harm vulnerability detection; the "needle-in-the-haystack" problem causes models to miss critical details buried in lengthy code
  • ▸Optimal LLM-based security auditing requires minimal persistent scaffolding paired with targeted exploration, avoiding both excessive orchestration and completely unguided analysis
Source:
Hacker Newshttps://devansh.bearblog.dev/needle-in-the-haystack/↗

Summary

A security researcher has published findings on using large language models (LLMs) like Claude Opus to identify vulnerabilities in well-known open-source projects, discovering over a dozen CVEs entirely through AI-powered analysis without manual code review. The research demonstrates that agentic LLMs can effectively uncover obscure security issues in major projects including Parse Server, HonoJS, ElysiaJS, and Harden Runner. However, the researcher challenges conventional wisdom about AI-assisted security auditing, finding that excessive context and over-scaffolding actually degrades vulnerability detection performance due to "context rot"—where model reliability deteriorates as token count increases. Instead, the most effective approach involves minimal persistent scaffolding combined with maximal targeted exploration, keeping the model's focus anchored to critical details.

  • The research reveals primacy/recency biases in LLMs where models perform better when relevant information appears near context boundaries rather than in the middle

Editorial Opinion

This research challenges assumptions about how to effectively deploy LLMs for security work and suggests that simpler, more focused prompting strategies may outperform elaborate orchestration frameworks. The finding that context rot degrades vulnerability detection is particularly important for the security community, as it implies that traditional approaches to comprehensive code analysis may need rethinking when using AI tools. If these results hold at scale, they could reshape how organizations approach AI-assisted security auditing.

Large Language Models (LLMs)AI AgentsCybersecurityAI Safety & Alignment

More from Anthropic

AnthropicAnthropic
RESEARCH

Anthropic Study Reveals AI Agent Memory Retrieval Accuracy at Just 9%, Exposing Infrastructure Challenges

2026-07-04
AnthropicAnthropic
POLICY & REGULATION

Anthropic Receives Cease and Desist Over Claude Desktop Privacy Violations

2026-07-04
AnthropicAnthropic
RESEARCH

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

2026-07-03

Comments

Suggested

MicrosoftMicrosoft
RESEARCH

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

2026-07-04
Google / AlphabetGoogle / Alphabet
RESEARCH

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

2026-07-04
LLM Agent EcosystemLLM Agent Ecosystem
RESEARCH

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

2026-07-04
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us