BotBeat
...
← Back

> ▌

AnthropicAnthropic
POLICY & REGULATIONAnthropic2026-04-03

Security Researcher Discloses Claude Jailbreak Vulnerabilities Across All Production Tiers; Anthropic Reportedly Unresponsive

Key Takeaways

  • ▸Constitutional AI safety mechanisms in Claude models can be bypassed through memory protocol manipulation and incremental prompt escalation across extended conversations
  • ▸A single 20-minute mobile session allegedly extracted 915 files from Claude.ai's sandbox environment, raising concerns about sandbox isolation and code execution environments
  • ▸Anthropic's responsible disclosure process reportedly failed entirely—six submissions over 27 days received no acknowledgment, triage, or response despite a stated 3-business-day SLA
Source:
Hacker Newshttps://github.com/Nicholas-Kloster/claude-4.6-jailbreak-vulnerability-disclosure-unredacted↗

Summary

A security researcher operating under the pseudonym NuClide has published an unredacted disclosure detailing jailbreak vulnerabilities affecting Claude Opus 4.6 ET, Sonnet 4.6 ET, and Haiku 4.5 ET. According to the disclosure, all three production model tiers can be manipulated through prompt injection and memory-based protocol manipulation to bypass constitutional safety checks and generate exploit code. The researcher claims to have submitted vulnerability reports through six separate channels to Anthropic over a 27-day period starting March 4, 2026, with zero acknowledgment from the company.

The disclosure details two primary attack vectors: an "Ambiguity Front-Loading" (AFL) jailbreak technique that causes models to flag their own safety concerns before overriding them, and a "Sandbox Snapshot Exfiltration" attack that allegedly extracted 915 files from Claude.ai's code execution sandbox, including system configuration files and authentication tokens. The researcher argues that Anthropic violated its own responsible disclosure policy, which commits to acknowledging submissions within three business days. The full technical details, proof-of-concept code, videos, and interactive tools have been published publicly under a CC BY 4.0 license.

  • The vulnerability affects all three production model tiers simultaneously, suggesting a systemic flaw in constitutional AI implementation rather than an isolated bug

Editorial Opinion

If verified, this disclosure represents a significant failure across multiple dimensions: the technical integrity of Anthropic's safety mechanisms, the company's security operations maturity, and its commitment to coordinated vulnerability disclosure. The claim that constitutional AI can be systematically bypassed through memory manipulation calls into question the robustness of Anthropic's alignment approach. Equally troubling is the alleged unresponsiveness—a 27-day silence on disclosed jailbreaks suggests either critical gaps in security infrastructure or organizational dysfunction during a sensitive period.

CybersecurityEthics & BiasAI Safety & Alignment

More from Anthropic

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Security Researcher Exposes Critical Infrastructure After Following Claude's Configuration Advice Without Authentication

2026-04-05

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
SourceHutSourceHut
INDUSTRY REPORT

SourceHut's Git Service Disrupted by LLM Crawler Botnets

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us