Anthropic Donates Petri AI Alignment Testing Tool to Independent Nonprofit Meridian Labs
Original: Anthropic Donates Petri Open-Source AI Alignment Testing Tool to Meridian Labs View original →
What is Petri?
Petri is an open-source AI alignment evaluation framework developed by Anthropic. It uses separate auditor and judge models to assess whether AI systems exhibit concerning behaviors such as deception, sycophancy, and cooperation with harmful requests.
What is New in Petri 3.0
Released alongside the donation, Petri 3.0 introduces three major improvements. First, separated components enable adaptability: the framework can now be customized for different evaluation purposes. Second, a new Dish add-on uses real system prompts and deployment scaffolding to prevent models from detecting they are being tested, increasing realism. Third, integration with Bloom enables more thorough behavioral assessments, adding significant evaluation depth.
Why Donate to Meridian Labs?
Anthropic transferred Petri to Meridian Labs, an independent nonprofit, for the same reason it donated the Model Context Protocol to the Linux Foundation: neutrality. A tool owned by a single commercial lab raises questions about bias. Under independent governance, Petri can serve labs, researchers, and governments as a credible third-party resource.
Strengthening the Alignment Ecosystem
With AI systems becoming increasingly capable, the ability to reliably test them for misaligned behavior is critical. By open-sourcing Petri and placing it under neutral stewardship, Anthropic is investing in the shared infrastructure needed to evaluate models responsibly across the entire industry.
Related Articles
Teaching Claude Why: Principle-Based Training Outperforms Behavioral Demonstrations for AI Alignment
New Anthropic alignment research shows that training AI models to understand the principles behind aligned behavior is significantly more effective than behavioral demonstrations alone. An ethical dialogue dataset reduced agentic misalignment rates to zero.
Anthropic announced Project Glasswing on April 7, 2026, giving defenders early access to Claude Mythos Preview to secure critical software. The initiative launches with major tech and financial partners plus up to $100 million in usage credits and $4 million in open-source security donations.
NVIDIA unveiled Nemotron 3 Nano Omni on April 28, 2026 — an open 30B-A3B hybrid MoE model unifying vision, audio, and language with a 256K context window and 9x higher throughput than comparable open omni models.
Comments (0)
No comments yet. Be the first to comment!