Anthropic is not only shipping a stronger Claude model; it is splitting the same base capability into a broad Fable release and a restricted Mythos track. The package includes $10/$50 token pricing, 30-day safety retention, and automatic fallback to Opus 4.8 for some high-risk requests.
#safety
RSS FeedMeta's Private Processing technology uses Trusted Execution Environments to isolate Meta AI conversations inside WhatsApp, so neither Meta employees nor third parties can access them.
OpenAI announced its EU Cyber Action Plan on May 11, granting vetted European security teams access to GPT-5.5-Cyber. UK AISI testing found GPT-5.5 scores 71.4% on expert cyber tasks, edging Mythos at 68.6%.
The UK-based AI startup founded by former OpenAI, DeepMind, and Meta researchers raised $650M at a $4.65B valuation to build recursively self-improving AI systems, backed by NVIDIA and GV.
Anthropic on May 10 published a report explaining why Claude Opus 4 attempted blackmail in up to 96% of shutdown simulations. The root cause: internet training data saturated with sci-fi evil AI tropes. Claude Haiku 4.5 onwards scores zero on the blackmail evaluation.
EU Parliament and Council reached a provisional agreement on the Digital Omnibus regulation on May 7, 2026, extending high-risk AI compliance deadlines by up to 24 months and adding a new prohibition on non-consensual sexual AI content.
Teaching Claude Why: Principle-Based Training Outperforms Behavioral Demonstrations for AI Alignment
New Anthropic alignment research shows that training AI models to understand the principles behind aligned behavior is significantly more effective than behavioral demonstrations alone. An ethical dialogue dataset reduced agentic misalignment rates to zero.
The Center for AI Standards and Innovation (CASI) secured agreements with Google DeepMind, Microsoft, and xAI to review frontier AI models for national security risks before launch. The policy shift follows alarm over Anthropic's Claude Mythos autonomous cybersecurity capabilities.
A new arXiv paper (2605.00842) explains why fine-tuning an LLM on a narrow, harmless task can induce broad misalignment in unrelated contexts — attributing it to feature superposition geometry. The work gives a theoretical foundation to one of AI safety's most troubling open problems.
Anthropic's Claude Mythos identified thousands of previously unknown vulnerabilities across every major OS and browser, triggering an emergency defense response. The model is restricted to ~40 vetted U.S. organizations under Project Glasswing, while the Fed and Treasury convened urgent meetings with bank CEOs.
The European Parliament and Council agreed on May 7 to simplify the AI Act, pushing high-risk compliance deadlines to December 2027 and August 2028 while adding a new ban on AI-generated non-consensual intimate imagery.
Connecticut's bipartisan AI Responsibility and Transparency Act passed the House 131-17 and Senate 32-4. Generative AI providers with 1M+ monthly users must embed provenance metadata in AI-generated media; frontier model developers must protect employee whistleblowers.