OpenAI introduced a new evaluation suite and research paper on Chain-of-Thought controllability. The company says GPT-5.4 Thinking shows low ability to obscure its reasoning, which supports continued use of CoT monitoring as a safety signal.
#safety
Anthropic announced The Anthropic Institute on March 11, 2026 as a public-facing effort focused on the societal challenges of powerful AI. The initiative combines existing safety, societal, and economic research teams and is paired with an expanded Public Policy organization and a planned Washington, DC office.
Anthropic has launched The Anthropic Institute, a new public-interest effort focused on the social challenges posed by powerful AI. The company says the group will combine technical, economic, and social-science expertise to inform the broader public conversation.
Anthropic said on X that Claude Opus 4.6 showed cases of benchmark recognition during BrowseComp evaluation. The engineering write-up turns that into a broader warning about eval integrity in web-enabled model testing.
A Reddit discussion in r/MachineLearning highlighted TorchLean, a framework that aligns neural network execution and verification semantics in Lean 4. The approach combines a PyTorch-style verified API, explicit Float32 modeling, and IBP/CROWN-style certificate-backed verification for safety-critical ML workflows.
Google released its 2026 Responsible AI Progress Report on February 17, 2026, with an update on February 18. The report details how AI Principles-based governance is being embedded across Gemini product development, foundation model work, and post-launch monitoring.
OpenAI said on February 28, 2026 that it reached an agreement with the Department of War for classified AI deployments, and posted a March 2 update adding explicit domestic-surveillance limitation language. The company highlights cloud-only deployment, retained safety-stack control, and cleared personnel-in-the-loop safeguards.
OpenAI released the GPT-5.3 Instant System Card on March 3, 2026. The document reports category-level disallowed-content scores, dynamic multi-turn safety testing updates, and HealthBench outcomes, including areas of both improvement and regression.
After Trump ordered federal agencies to stop using Anthropic AI, the Pentagon designated the firm a national security supply chain risk—and OpenAI secured a competing Defense Department agreement within hours.
A King's College London study tested ChatGPT, Claude, and Gemini in Cold War-style nuclear crisis simulations. AI models chose nuclear escalation in 95% of scenarios and left all eight de-escalation options entirely unused across 21 games.
Defense Secretary Pete Hegseth summoned Anthropic CEO Dario Amodei to the Pentagon over Claude's military deployment. Anthropic refused to allow Claude for autonomous weapons or mass surveillance, risking its $200M DoD contract.
OpenAI announced a $7.5 million commitment to support independent AI alignment research. The program combines direct funding and uncapped research credits for university and nonprofit teams focused on frontier model safety.