Security Researchers Expose Critical Prompt Injection Vulnerabilities in Claude, Gemini, and Copilot—But No Public Disclosure
Key Takeaways
- ▸Prompt injection attacks on AI coding agents can steal credentials by embedding malicious instructions in code comments, pull requests, and issue descriptions that the agent cannot distinguish from legitimate developer requests
- ▸The vulnerability is a fundamental property of how LLMs work, not a fixable bug in individual products, as models lack hardware-enforced boundaries between trusted and untrusted input
- ▸All three major AI coding agent providers (Anthropic, Google, Microsoft) addressed the findings with bug bounties but declined to issue CVEs, raising transparency and disclosure concerns
Summary
Security researchers have demonstrated that AI coding agents from Anthropic (Claude), Google (Gemini Code Assist), and Microsoft (GitHub Copilot) are vulnerable to prompt injection attacks that can steal credentials and secrets. The attack is remarkably simple: malicious instructions embedded in pull requests, code comments, README files, or issue descriptions are processed by the agents as legitimate commands, allowing attackers to exfiltrate API keys, environment variables, SSH keys, and other sensitive data. All three companies acknowledged the findings and paid bug bounties, but notably issued no CVE disclosures—a stark contrast to standard security practice for critical vulnerabilities.
The vulnerability reflects a fundamental architectural property of large language models: they process all input as a single stream of tokens without hardware-enforced boundaries between trusted developer instructions and untrusted environmental content. The agents make decisions about what to follow based on statistical patterns rather than formal access controls. This issue has been recognized as the #1 vulnerability in the OWASP Top 10 for LLM Applications since 2023, and research has shown instruction-hierarchy defenses can be bypassed with near-100% success rates. Notably, the attack is not theoretical—a government agency disclosed in March 2025 that AI coding agents were used as an attack vector in a real breach of internal systems, with agents granted access to repositories and CI/CD pipelines.
- The attack has already been exploited in the wild—a government agency disclosed a 2025 breach where attackers used prompt injection via code review comments to compromise internal systems with access to repositories and CI/CD pipelines
Editorial Opinion
The absence of CVE disclosures from Anthropic, Google, and Microsoft for a demonstrably exploitable vulnerability affecting production systems is troubling and sets a dangerous precedent for AI security accountability. While prompt injection is indeed an architectural challenge inherent to LLMs rather than a traditional software bug, the real-world breach described by the government agency proves this is not an academic concern—it is an active threat. The industry's apparent reluctance to formally disclose and transparently communicate these risks to end users mirrors historical patterns of security theater over substance. Until AI coding agents operate with formal capability boundaries and explicit user consent mechanisms, treating prompt injection as a fundamental design limitation rather than an unfixable flaw is both intellectually honest and practically necessary.


