Anthropic Donates Petri Alignment Tool to Meridian Labs, Releases Major v3.0 Update
Key Takeaways
- ▸Petri donated to Meridian Labs for independent, credible development separate from Anthropic
- ▸Petri v3.0 delivers major improvements in adaptability, realism, and depth of alignment testing
- ▸Tool has demonstrated real-world adoption and impact, including use by UK AI Security Institute
Summary
Anthropic announced it is donating Petri, its open-source alignment testing tool, to Meridian Labs, an AI evaluation non-profit. The move mirrors Anthropic's previous donation of the Model Context Protocol (MCP) to the Linux Foundation and ensures that Petri remains independent from any single AI lab, enhancing its credibility across the industry.
Simultaneously, Anthropic released Petri version 3.0, a major update developed in collaboration with Meridian Labs. The new version introduces three significant improvements: enhanced adaptability through modular architecture (separating auditor and target models), improved realism via a new "Dish" add-on that uses real system prompts and deployment scaffolds, and greater depth through integration with Anthropic's Bloom alignment tool for more in-depth behavioral assessments.
Petri, which launched in October 2025 as part of the Anthropic Fellows program, has become a key component of Anthropic's alignment evaluation framework, used to assess all Claude models since Claude Sonnet 4.5. The tool enables rapid testing for concerning tendencies like deception, sycophancy, and susceptibility to harmful requests. It has already gained traction with external organizations, including the UK's AI Security Institute (AISI), which integrated Petri into their model evaluation processes.
- Reflects Anthropic's commitment to open-source AI safety infrastructure alongside production models
Editorial Opinion
Anthropic's decision to donate Petri to Meridian Labs is a strategic move that prioritizes the credibility of alignment evaluation over institutional control. By releasing v3.0 with substantial improvements before the transition, Anthropic demonstrates genuine commitment to the tool's long-term success rather than using the donation as a way to simply offload maintenance. This approach strengthens the case for why major AI labs should invest in open-source alignment infrastructure—if done authentically, it elevates the entire field's ability to evaluate AI behavior rigorously and independently.

