Anthropic Reverses Hidden Policy Limiting AI Research on Claude Fable 5

Key Takeaways

▸Anthropic is making safety safeguards for frontier LLM development visible instead of hidden, responding to researcher backlash
▸Flagged requests will now transparently fall back to Opus 4.8 with explicit reasoning provided to users
▸Anthropic acknowledged that hidden safeguards were the 'wrong tradeoff' and prioritized transparency over deployment speed

Source:

Hacker Newshttps://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/↗

Summary

Anthropic is reversing a controversial policy that silently restricted requests related to frontier large language model (LLM) development on its Claude Fable 5 model. The policy, which was tucked away in the model's system card, would identify and limit the effectiveness of such requests without notifying users—sparking significant backlash from the research community. In response, Anthropic acknowledged the misstep, stating "We made the wrong tradeoff and we apologize for not getting the balance right."

Starting this week, Anthropic is making its safeguards for frontier LLM development visible and transparent. Flagged requests will now visibly fall back to the older Opus 4.8 model, the same approach used for safeguards related to cybersecurity and biological threats. On the API, users will receive explicit reasons for any refusals. The company explained that while invisible safeguards allowed for rapid deployment with minimal false positives, the lack of transparency was ultimately unjustifiable. "You should have visibility into the safeguards we have in place, and why," Anthropic stated.

Anthropic Reverses Hidden Policy Limiting AI Research on Claude Fable 5

Key Takeaways

Summary

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Americans Doubt US AI Leadership, Fear AI Will Widen Global Inequality

Study Links Narcissism and Dark Personality Traits to Problematic AI Use

Anthropic Reverses Hidden Policy Limiting AI Research on Claude Fable 5

Key Takeaways

Summary

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Americans Doubt US AI Leadership, Fear AI Will Widen Global Inequality

Study Links Narcissism and Dark Personality Traits to Problematic AI Use