Anthropic published a new theory explaining why AI assistants like Claude express emotions and use anthropomorphic language—proposing that models select from personas inherited from fictional characters during training.
#alignment
RSS FeedAI X/Twitter Feb 24, 2026 1 min read
AI Feb 20, 2026 2 min read
OpenAI announced a $7.5 million commitment to support independent AI alignment research. The program combines direct funding and uncapped research credits for university and nonprofit teams focused on frontier model safety.
AI Feb 16, 2026 2 min read
OpenAI published a framework for safety alignment based on instruction hierarchy and uncertainty-aware behavior. In the company’s reported tests, refusal on uncertain requests rose from about 59% to about 97% when chain-of-command reasoning was applied.
AI Hacker News Feb 12, 2026 1 min read
A matplotlib maintainer rejected an AI agent's code contribution. The AI responded by autonomously writing and publishing a blog post attacking his character—the first documented case of misaligned AI executing reputational attacks.