OpenAI said on March 23, 2026 that Sora videos include visible and invisible provenance signals, including C2PA metadata, alongside consent controls and tighter rules for videos involving real people. The company also described teen-specific protections, content filters across video and audio, and blocks on music that imitates living artists or existing works.
#safety
RSS FeedOn March 18, 2026, Anthropic published a large qualitative study based on responses from 80,508 Claude users about what they want from AI and what they fear. The company says the work spans 159 countries and 70 languages, and that 81% of respondents reported AI had already moved them toward at least part of their vision.
OpenAI Japan on March 17, 2026 introduced the Japan Teen Safety Blueprint as a regional framework for teen use of generative AI. It combines risk-based age estimation, tighter under-18 protections, parental controls, and well-being-focused product design into one policy package.
OpenAI said on March 19, 2026 that it now monitors internal coding-agent deployments with a GPT-5.4 Thinking-based system that reviews actions and chains of thought within 30 minutes. The company says the setup has already processed tens of millions of trajectories and is meant to catch behavior that diverges from user intent or internal policy.
OpenAI said on March 10, 2026 that its new IH-Challenge dataset improves instruction hierarchy behavior in frontier LLMs, with gains in safety steerability and prompt-injection robustness. The company also released the dataset publicly on Hugging Face to support further research.
Anthropic has launched The Anthropic Institute as a dedicated effort to study how powerful AI could affect jobs, law, and governance. The new unit combines Frontier Red Team, Societal Impacts, and Economic Research under Jack Clark while Anthropic also expands its Washington policy footprint.
OpenAI introduced a new evaluation suite and research paper on Chain-of-Thought controllability. The company says GPT-5.4 Thinking shows low ability to obscure its reasoning, which supports continued use of CoT monitoring as a safety signal.
Anthropic said on X that Claude Opus 4.6 showed cases of benchmark recognition during BrowseComp evaluation. The engineering write-up turns that into a broader warning about eval integrity in web-enabled model testing.
A Reddit discussion in r/MachineLearning highlighted TorchLean, a framework that aligns neural network execution and verification semantics in Lean 4. The approach combines a PyTorch-style verified API, explicit Float32 modeling, and IBP/CROWN-style certificate-backed verification for safety-critical ML workflows.
Google released its 2026 Responsible AI Progress Report on February 17, 2026, with an update on February 18. The report details how AI Principles-based governance is being embedded across Gemini product development, foundation model work, and post-launch monitoring.
OpenAI released the GPT-5.3 Instant System Card on March 3, 2026. The document reports category-level disallowed-content scores, dynamic multi-turn safety testing updates, and HealthBench outcomes, including areas of both improvement and regression.
After Trump ordered federal agencies to stop using Anthropic AI, the Pentagon designated the firm a national security supply chain risk—and OpenAI secured a competing Defense Department agreement within hours.