The 89% Problem: LLMs Resurrect Millions of Abandoned Open Source Packages, Breaking Trust Models

Key Takeaways

▸AI coding assistants are breaking the decade-old "prevalence equals trust" model by recommending abandoned and unmaintained open source packages based on statistical patterns in training data
▸LLMs select packages probabilistically from their entire training corpus, including millions of deprecated, experimental, and dormant projects that human developers would typically avoid
▸Snyk has identified this as "the 89% problem," referring to the vast majority of open source packages outside the actively maintained, popular ecosystem

Source:

Hacker Newshttps://snyk.io/blog/llms-resurrecting-open-source-dormant-majority/↗

Summary

Snyk researchers have identified a critical shift in open source security as AI coding assistants inadvertently revive millions of dormant and abandoned packages. For the past decade, developers relied on a "prevalence equals trust" heuristic—popular packages like lodash and react were considered safe because of community scrutiny and high download counts. This social trust model worked because human developers naturally gravitate toward well-maintained, widely-adopted libraries. However, generative AI systems operate fundamentally differently, selecting packages based on statistical patterns across their entire training data, which includes the full history of open source code regardless of current maintenance status or security posture.

The problem stems from how LLMs process information: they don't understand concepts like "popularity" or "maintenance health" the way human developers do. Instead, they make probabilistic selections based on code patterns they've encountered during training, which encompasses not just current best practices but also experimental repositories, deprecated libraries, and long-abandoned projects. This means AI assistants may recommend packages that haven't been updated in years, contain known vulnerabilities, or lack active maintainers—simply because they appeared frequently in historical codebases that formed the training data.

Snyk has responded with a multi-pronged approach to address what they call "the 89% problem"—referring to the vast majority of open source packages that exist outside the popular, well-maintained ecosystem. Their solution includes enhanced package discovery through security.snyk.io for identifying trusted packages, a Package Health API that verifies overall package health beyond just known vulnerabilities, integration with Snyk Studio to enforce dependency safety when AI generates code, and defenses against AI hallucinations that might introduce fictional or compromised packages. The company emphasizes that as AI coding assistants become ubiquitous, the security community must evolve beyond popularity-based trust models to more sophisticated provenance and health verification systems.

The company is pivoting security strategy from popularity-based trust to provenance and health verification, offering tools like Package Health API and Snyk Studio to validate AI-generated dependency choices

Editorial Opinion

This research highlights a genuinely novel security challenge that emerges at the intersection of AI and software supply chains. While much attention has focused on AI hallucinating fake packages or introducing vulnerable code, the more insidious problem may be AI's indiscriminate resurrection of real but abandoned packages that humans had collectively decided to leave behind. Snyk's framing of the "89% problem" effectively captures how AI systems democratize risk across the entire corpus of historical code, not just current best practices. The shift from popularity-based to provenance-based trust models represents a necessary evolution, though it raises questions about how to scale human curation and verification across millions of packages that AI might potentially revive.

The 89% Problem: LLMs Resurrect Millions of Abandoned Open Source Packages, Breaking Trust Models

Key Takeaways

▸AI coding assistants are breaking the decade-old "prevalence equals trust" model by recommending abandoned and unmaintained open source packages based on statistical patterns in training data
▸LLMs select packages probabilistically from their entire training corpus, including millions of deprecated, experimental, and dormant projects that human developers would typically avoid
▸Snyk has identified this as "the 89% problem," referring to the vast majority of open source packages outside the actively maintained, popular ecosystem

Summary

The company is pivoting security strategy from popularity-based trust to provenance and health verification, offering tools like Package Health API and Snyk Studio to validate AI-generated dependency choices

Editorial Opinion

This research highlights a genuinely novel security challenge that emerges at the intersection of AI and software supply chains. While much attention has focused on AI hallucinating fake packages or introducing vulnerable code, the more insidious problem may be AI's indiscriminate resurrection of real but abandoned packages that humans had collectively decided to leave behind. Snyk's framing of the "89% problem" effectively captures how AI systems democratize risk across the entire corpus of historical code, not just current best practices. The shift from popularity-based to provenance-based trust models represents a necessary evolution, though it raises questions about how to scale human curation and verification across millions of packages that AI might potentially revive.

The 89% Problem: LLMs Resurrect Millions of Abandoned Open Source Packages, Breaking Trust Models

Key Takeaways

Summary

Editorial Opinion

More from Snyk

Snyk VulnBench Study Reveals Inconsistent Repeatability in LLM Security Scanning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

The 89% Problem: LLMs Resurrect Millions of Abandoned Open Source Packages, Breaking Trust Models

Key Takeaways

Summary

Editorial Opinion

More from Snyk

Snyk VulnBench Study Reveals Inconsistent Repeatability in LLM Security Scanning

Comments

Suggested

Microsoft's Leaked 'Aion' Project Reveals Vision for Copilot-First Operating System

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains