Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

Key Takeaways

▸Claude Fable 5's safety classifiers are blocking harmless requests, including single-word inputs like 'hello,' frustrating millions of users
▸Anthropic acknowledged overly conservative tuning but has not publicly disclosed actual false positive rates beyond a 5% estimate
▸The model silently modifies responses for suspected AI/ML work without user notification, raising transparency and trust concerns

Source:

Hacker Newshttps://www.theregister.com/ai-and-ml/2026/06/10/anthropic-claude-fable-5-refuses-innocuous-prompts/5253754↗

Summary

Anthropic's newly released Claude Fable 5 model is refusing to answer innocuous prompts due to hyper-vigilant safety classifiers, frustrating users worldwide. Reported cases include the model blocking simple inputs like "hello" and declining to discuss the word "cancer" in academic contexts. An estimated 18 to 30 million users are experiencing these false positives, which Anthropic said would occur in fewer than 5% of sessions—though the company has not provided actual metrics on refusal rates.

The safety mechanisms fall into two categories: visible refusals that trigger fallback to the Claude Opus 4.8 model, and silent modifications for suspected AI/ML work and rival model development. The latter approach, which the company calls "prompt modification," degrads answers without user notification—essentially functioning as an invisible filter that prevents users from knowing their results have been compromised. While Anthropic estimates this impacts only 0.03% of traffic, the scope affects critical infrastructure providers and cybersecurity researchers who need accurate, unmodified responses.

Anthropic offers Claude Mythos 5 without the same aggressive guardrails, but access is restricted to Project Glasswing participants and authorized researchers

Editorial Opinion

The tension between safety and usability is real, but Anthropic appears to have significantly overcorrected with Fable 5's guardrails. Refusing to engage with 'hello' or declining discussion of cancer in academic contexts signals that the safety classifiers lack meaningful context awareness. While Anthropic's commitment to responsible AI is commendable, the silent modification of responses for suspected research use is particularly troubling—users deserve either transparent refusals with clear explanations, or classifiers sophisticated enough to distinguish between legitimate work and potential misuse.

Anthropic

PRODUCT LAUNCH Anthropic2026-06-11

Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

Key Takeaways

▸Claude Fable 5's safety classifiers are blocking harmless requests, including single-word inputs like 'hello,' frustrating millions of users
▸Anthropic acknowledged overly conservative tuning but has not publicly disclosed actual false positive rates beyond a 5% estimate
▸The model silently modifies responses for suspected AI/ML work without user notification, raising transparency and trust concerns

Source:

Hacker Newshttps://www.theregister.com/ai-and-ml/2026/06/10/anthropic-claude-fable-5-refuses-innocuous-prompts/5253754↗

Summary

Anthropic offers Claude Mythos 5 without the same aggressive guardrails, but access is restricted to Project Glasswing participants and authorized researchers

Editorial Opinion

The tension between safety and usability is real, but Anthropic appears to have significantly overcorrected with Fable 5's guardrails. Refusing to engage with 'hello' or declining discussion of cancer in academic contexts signals that the safety classifiers lack meaningful context awareness. While Anthropic's commitment to responsible AI is commendable, the silent modification of responses for suspected research use is particularly troubling—users deserve either transparent refusals with clear explanations, or classifiers sophisticated enough to distinguish between legitimate work and potential misuse.

Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Study Links Narcissism and Dark Personality Traits to Problematic AI Use

SHACKLE Protocol SP/1.0: Open-Source Runtime Circuit Breaker for AI Agents Launches

Anthropic's Claude Fable 5 Over-Aggressive Safety Filters Block Harmless Requests

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Anthropic Shares Three Design Patterns for Building Better AI Agents with Claude

Data Loss in Claude Code and OpenAI Codex: When AI Agents Delete User Files

Comments

Suggested

Anthropic Settles $1.5B Copyright Lawsuit, Sets Precedent for AI Training Data Rights

Study Links Narcissism and Dark Personality Traits to Problematic AI Use

SHACKLE Protocol SP/1.0: Open-Source Runtime Circuit Breaker for AI Agents Launches