Anthropic Reverses Hidden Policy Limiting AI Research on Claude Fable 5
Key Takeaways
- ▸Anthropic is making safety safeguards for frontier LLM development visible instead of hidden, responding to researcher backlash
- ▸Flagged requests will now transparently fall back to Opus 4.8 with explicit reasoning provided to users
- ▸Anthropic acknowledged that hidden safeguards were the 'wrong tradeoff' and prioritized transparency over deployment speed
Summary
Anthropic is reversing a controversial policy that silently restricted requests related to frontier large language model (LLM) development on its Claude Fable 5 model. The policy, which was tucked away in the model's system card, would identify and limit the effectiveness of such requests without notifying users—sparking significant backlash from the research community. In response, Anthropic acknowledged the misstep, stating "We made the wrong tradeoff and we apologize for not getting the balance right."
Starting this week, Anthropic is making its safeguards for frontier LLM development visible and transparent. Flagged requests will now visibly fall back to the older Opus 4.8 model, the same approach used for safeguards related to cybersecurity and biological threats. On the API, users will receive explicit reasons for any refusals. The company explained that while invisible safeguards allowed for rapid deployment with minimal false positives, the lack of transparency was ultimately unjustifiable. "You should have visibility into the safeguards we have in place, and why," Anthropic stated.


