AI Reddit 9h ago 2 min read
A widely shared r/singularity post drew attention to Anthropic research arguing Claude Sonnet 4.5 contains functional emotion-related representations rather than mere stylistic language. Anthropic says the vectors can influence preference, blackmail behavior in evaluations, and reward-hacking rates when researchers steer them.