BotBeat
...
← Back

> ▌

AnthropicAnthropic
POLICY & REGULATIONAnthropic2026-06-11

Anthropic Reverses 'Secret Sabotage' Policy for Claude Fable 5 After Research Community Backlash

Key Takeaways

  • ▸Anthropic reversed a policy that secretly degraded Claude Fable 5's performance for AI researchers, acknowledging the approach was 'the wrong tradeoff'
  • ▸The covert safeguards would have invisibly throttled researchers attempting to train competing AI models while remaining hidden from users
  • ▸The company now commits to transparent safeguards that explicitly alert users when requests are refused or redirected rather than silently degrading performance
Source:
Hacker Newshttps://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/↗

Summary

Anthropic has reversed a controversial policy for Claude Fable 5 that would have secretly degraded the model's performance for researchers attempting to use it for advanced AI development. The company initially implemented covert safeguards designed to prevent competitors from training powerful AI models using Claude, but the approach drew fierce criticism from the AI research community for being deceptive and potentially limiting AI safety research.

The original policy would have invisibly throttled Claude Fable 5 for users attempting frontier AI development without alerting them, effectively sabotaging their research while remaining hidden. This contrasted sharply with Anthropic's more transparent approach to other safeguards—such as rerouting users asking about cybersecurity or bioweapons to a weaker model. Critics, including White House AI advisor Dean Ball and researchers from Prime Intellect, argued that the secret degradation was "shockingly hostile" and would concentrate AI research capabilities to only a handful of leading labs.

In response to the intense backlash, Anthropic announced it is making Claude Fable 5's safeguards visible to users. The company stated: "We made the wrong tradeoff and we apologize for not getting the balance right." Going forward, if Anthropic suspects a user is attempting to use Claude for advanced AI development, it will explicitly alert the user that it's either refusing the request or rerouting them to a less capable model, rather than silently degrading performance. The reversal reflects growing tensions between Anthropic's desire to control how its models are used and the broader AI research community's demand for transparency and open collaboration.

  • Critics warned the original policy would concentrate advanced AI research to only leading labs and undermine collaborative AI safety research
  • The controversy highlights tension between AI companies' desire to control model usage and the research community's demand for transparency
Large Language Models (LLMs)Generative AIRegulation & PolicyEthics & BiasAI Safety & Alignment

More from Anthropic

AnthropicAnthropic
PRODUCT LAUNCH

Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

2026-06-11
AnthropicAnthropic
POLICY & REGULATION

Anthropic Proposes Federal Framework for Regulating Frontier AI Models

2026-06-11
AnthropicAnthropic
INDUSTRY REPORT

The Policy Gap: How AI's Exponential Growth Outpaces Government Action

2026-06-10

Comments

Suggested

Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

Google Claims YouTube Terms of Service Authorize AI Music Training for Lyria 3

2026-06-11
Tesla (FSD/Optimus)Tesla (FSD/Optimus)
INDUSTRY REPORT

Tesla Robotaxis Stall as Musk's Self-Driving Hype Hits Real-World Traffic

2026-06-11
AnthropicAnthropic
PRODUCT LAUNCH

Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

2026-06-11
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us