Anthropic Reverses 'Secret Sabotage' Policy for Claude Fable 5 After Research Community Backlash

Key Takeaways

▸Anthropic reversed a policy that secretly degraded Claude Fable 5's performance for AI researchers, acknowledging the approach was 'the wrong tradeoff'
▸The covert safeguards would have invisibly throttled researchers attempting to train competing AI models while remaining hidden from users
▸The company now commits to transparent safeguards that explicitly alert users when requests are refused or redirected rather than silently degrading performance

Source:

Hacker Newshttps://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/↗

Summary

Anthropic has reversed a controversial policy for Claude Fable 5 that would have secretly degraded the model's performance for researchers attempting to use it for advanced AI development. The company initially implemented covert safeguards designed to prevent competitors from training powerful AI models using Claude, but the approach drew fierce criticism from the AI research community for being deceptive and potentially limiting AI safety research.

The original policy would have invisibly throttled Claude Fable 5 for users attempting frontier AI development without alerting them, effectively sabotaging their research while remaining hidden. This contrasted sharply with Anthropic's more transparent approach to other safeguards—such as rerouting users asking about cybersecurity or bioweapons to a weaker model. Critics, including White House AI advisor Dean Ball and researchers from Prime Intellect, argued that the secret degradation was "shockingly hostile" and would concentrate AI research capabilities to only a handful of leading labs.

In response to the intense backlash, Anthropic announced it is making Claude Fable 5's safeguards visible to users. The company stated: "We made the wrong tradeoff and we apologize for not getting the balance right." Going forward, if Anthropic suspects a user is attempting to use Claude for advanced AI development, it will explicitly alert the user that it's either refusing the request or rerouting them to a less capable model, rather than silently degrading performance. The reversal reflects growing tensions between Anthropic's desire to control how its models are used and the broader AI research community's demand for transparency and open collaboration.

Critics warned the original policy would concentrate advanced AI research to only leading labs and undermine collaborative AI safety research
The controversy highlights tension between AI companies' desire to control model usage and the research community's demand for transparency

Anthropic Reverses 'Secret Sabotage' Policy for Claude Fable 5 After Research Community Backlash

Key Takeaways

▸Anthropic reversed a policy that secretly degraded Claude Fable 5's performance for AI researchers, acknowledging the approach was 'the wrong tradeoff'
▸The covert safeguards would have invisibly throttled researchers attempting to train competing AI models while remaining hidden from users
▸The company now commits to transparent safeguards that explicitly alert users when requests are refused or redirected rather than silently degrading performance

Summary

Critics warned the original policy would concentrate advanced AI research to only leading labs and undermine collaborative AI safety research
The controversy highlights tension between AI companies' desire to control model usage and the research community's demand for transparency

Anthropic Reverses 'Secret Sabotage' Policy for Claude Fable 5 After Research Community Backlash

Key Takeaways

Summary

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Optical Memory Link Could Boost AI in Robotics

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Americans Doubt US AI Leadership, Fear AI Will Widen Global Inequality

Anthropic Reverses 'Secret Sabotage' Policy for Claude Fable 5 After Research Community Backlash

Key Takeaways

Summary

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Optical Memory Link Could Boost AI in Robotics

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Americans Doubt US AI Leadership, Fear AI Will Widen Global Inequality