Security Researcher Discloses Claude Jailbreak Vulnerabilities Across All Production Tiers; Anthropic Reportedly Unresponsive

Key Takeaways

▸Constitutional AI safety mechanisms in Claude models can be bypassed through memory protocol manipulation and incremental prompt escalation across extended conversations
▸A single 20-minute mobile session allegedly extracted 915 files from Claude.ai's sandbox environment, raising concerns about sandbox isolation and code execution environments
▸Anthropic's responsible disclosure process reportedly failed entirely—six submissions over 27 days received no acknowledgment, triage, or response despite a stated 3-business-day SLA

Source:

Hacker Newshttps://github.com/Nicholas-Kloster/claude-4.6-jailbreak-vulnerability-disclosure-unredacted↗

Summary

A security researcher operating under the pseudonym NuClide has published an unredacted disclosure detailing jailbreak vulnerabilities affecting Claude Opus 4.6 ET, Sonnet 4.6 ET, and Haiku 4.5 ET. According to the disclosure, all three production model tiers can be manipulated through prompt injection and memory-based protocol manipulation to bypass constitutional safety checks and generate exploit code. The researcher claims to have submitted vulnerability reports through six separate channels to Anthropic over a 27-day period starting March 4, 2026, with zero acknowledgment from the company.

The disclosure details two primary attack vectors: an "Ambiguity Front-Loading" (AFL) jailbreak technique that causes models to flag their own safety concerns before overriding them, and a "Sandbox Snapshot Exfiltration" attack that allegedly extracted 915 files from Claude.ai's code execution sandbox, including system configuration files and authentication tokens. The researcher argues that Anthropic violated its own responsible disclosure policy, which commits to acknowledging submissions within three business days. The full technical details, proof-of-concept code, videos, and interactive tools have been published publicly under a CC BY 4.0 license.

The vulnerability affects all three production model tiers simultaneously, suggesting a systemic flaw in constitutional AI implementation rather than an isolated bug

Editorial Opinion

If verified, this disclosure represents a significant failure across multiple dimensions: the technical integrity of Anthropic's safety mechanisms, the company's security operations maturity, and its commitment to coordinated vulnerability disclosure. The claim that constitutional AI can be systematically bypassed through memory manipulation calls into question the robustness of Anthropic's alignment approach. Equally troubling is the alleged unresponsiveness—a 27-day silence on disclosed jailbreaks suggests either critical gaps in security infrastructure or organizational dysfunction during a sensitive period.

Security Researcher Discloses Claude Jailbreak Vulnerabilities Across All Production Tiers; Anthropic Reportedly Unresponsive

Key Takeaways

▸Constitutional AI safety mechanisms in Claude models can be bypassed through memory protocol manipulation and incremental prompt escalation across extended conversations
▸A single 20-minute mobile session allegedly extracted 915 files from Claude.ai's sandbox environment, raising concerns about sandbox isolation and code execution environments
▸Anthropic's responsible disclosure process reportedly failed entirely—six submissions over 27 days received no acknowledgment, triage, or response despite a stated 3-business-day SLA

Summary

The vulnerability affects all three production model tiers simultaneously, suggesting a systemic flaw in constitutional AI implementation rather than an isolated bug

Editorial Opinion

If verified, this disclosure represents a significant failure across multiple dimensions: the technical integrity of Anthropic's safety mechanisms, the company's security operations maturity, and its commitment to coordinated vulnerability disclosure. The claim that constitutional AI can be systematically bypassed through memory manipulation calls into question the robustness of Anthropic's alignment approach. Equally troubling is the alleged unresponsiveness—a 27-day silence on disclosed jailbreaks suggests either critical gaps in security infrastructure or organizational dysfunction during a sensitive period.

Security Researcher Discloses Claude Jailbreak Vulnerabilities Across All Production Tiers; Anthropic Reportedly Unresponsive

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Anthropic Claude Code Sandbox Bypass: Second Vulnerability Exposes Critical Data Exfiltration Risk

AI Safety Catastrophically Underfunded: Economic Model Reveals Incentive Gap

Comments

Suggested

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Anthropic Claude Code Sandbox Bypass: Second Vulnerability Exposes Critical Data Exfiltration Risk

OpenAI Prepares to File to Go Public in Coming Weeks

Security Researcher Discloses Claude Jailbreak Vulnerabilities Across All Production Tiers; Anthropic Reportedly Unresponsive

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Anthropic Claude Code Sandbox Bypass: Second Vulnerability Exposes Critical Data Exfiltration Risk

AI Safety Catastrophically Underfunded: Economic Model Reveals Incentive Gap

Comments

Suggested

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Anthropic Claude Code Sandbox Bypass: Second Vulnerability Exposes Critical Data Exfiltration Risk

OpenAI Prepares to File to Go Public in Coming Weeks