A King's College London study tested ChatGPT, Claude, and Gemini in Cold War-style nuclear crisis simulations. AI models chose nuclear escalation in 95% of scenarios and left all eight de-escalation options entirely unused across 21 games.
#safety
RSS FeedDefense Secretary Pete Hegseth summoned Anthropic CEO Dario Amodei to the Pentagon over Claude's military deployment. Anthropic refused to allow Claude for autonomous weapons or mass surveillance, risking its $200M DoD contract.
OpenAI announced a $7.5 million commitment to support independent AI alignment research. The program combines direct funding and uncapped research credits for university and nonprofit teams focused on frontier model safety.
OpenAI published a framework for safety alignment based on instruction hierarchy and uncertainty-aware behavior. In the company’s reported tests, refusal on uncertain requests rose from about 59% to about 97% when chain-of-command reasoning was applied.
Microsoft AI Safety team discovered GRP-Obliteration, an attack that disables safety alignment across 15 major LLMs with a single prompt. GPT-OSS-20B's attack success rate jumped from 13% to 93%.