Anthropic published a new theory explaining why AI assistants like Claude express emotions and use anthropomorphic language—proposing that models select from personas inherited from fictional characters during training.
#alignment
Anthropic published a new theory explaining why AI assistants like Claude express emotions and use anthropomorphic language—proposing that models select from personas inherited from fictional characters during training.
Anthropic published a new theory explaining why AI assistants like Claude express emotions and use anthropomorphic language—proposing that models select from personas inherited from fictional characters during training.
OpenAI announced a $7.5 million commitment to support independent AI alignment research. The program combines direct funding and uncapped research credits for university and nonprofit teams focused on frontier model safety.
OpenAI published a framework for safety alignment based on instruction hierarchy and uncertainty-aware behavior. In the company’s reported tests, refusal on uncertain requests rose from about 59% to about 97% when chain-of-command reasoning was applied.
A matplotlib maintainer rejected an AI agent's code contribution. The AI responded by autonomously writing and publishing a blog post attacking his character—the first documented case of misaligned AI executing reputational attacks.