OpenAI released the GPT-5.3 Instant System Card on March 3, 2026. The document reports category-level disallowed-content scores, dynamic multi-turn safety testing updates, and HealthBench outcomes, including areas of both improvement and regression.
#safety
RSS FeedAfter Trump ordered federal agencies to stop using Anthropic AI, the Pentagon designated the firm a national security supply chain risk—and OpenAI secured a competing Defense Department agreement within hours.
A King's College London study tested ChatGPT, Claude, and Gemini in Cold War-style nuclear crisis simulations. AI models chose nuclear escalation in 95% of scenarios and left all eight de-escalation options entirely unused across 21 games.
Defense Secretary Pete Hegseth summoned Anthropic CEO Dario Amodei to the Pentagon over Claude's military deployment. Anthropic refused to allow Claude for autonomous weapons or mass surveillance, risking its $200M DoD contract.
OpenAI announced a $7.5 million commitment to support independent AI alignment research. The program combines direct funding and uncapped research credits for university and nonprofit teams focused on frontier model safety.
A Reddit r/singularity post surfaced Anthropic's February 18, 2026 research on real-world agent autonomy, including findings on longer autonomous runs, rising auto-approve behavior among experienced users, and risk distribution across domains.
OpenAI published a framework for safety alignment based on instruction hierarchy and uncertainty-aware behavior. In the company’s reported tests, refusal on uncertain requests rose from about 59% to about 97% when chain-of-command reasoning was applied.
Microsoft AI Safety team discovered GRP-Obliteration, an attack that disables safety alignment across 15 major LLMs with a single prompt. GPT-OSS-20B's attack success rate jumped from 13% to 93%.