Skip to content

Anthropic AI Safety Research Watch: Bug Bounty, Petri, and Alignment Papers

Anthropic's concentrated May safety push: public bug bounty on HackerOne, Petri open-source donation, principle-based alignment research, reading Claude's thoughts with NL autoencoders, and eliminating blackmail behavior traced to sci-fi training data

Share: Long
1
2
3
4
5