Reddit Tries to Put Numbers on the Feeling That Claude Got More Cautious

r/artificial pushed this post up because it did something communities love when a model starts feeling different: it tried to count the change instead of just complaining about the vibe. The thread reached 182 upvotes and 105 comments after one user claimed Claude had grown noticeably more cautious, less warm, and less productive, then attached numbers from their own exported chat history. That is what made the post spread. It was not an official benchmark suite or an Anthropic paper. It was a working-user audit trying to pin down a pattern people said they had been noticing in day-to-day use.

The measurements in the post are specific enough that commenters could argue about something more concrete than mood. The author said they analyzed 70 exported conversations containing 722,522 words of assistant text, split before and after March 26. Their headline claims were that response length fell 40%, welfare redirects rose 275%, DARVO-like patterns rose 907%, and the ratio of conversation words to finished-document words worsened from 21 to 124. The poster argued that Anthropic's explanation around session limits did not account for that behavior shift.

Dataset claimed in the post: 70 exported conversations, 722,522 words
Reported change after March 26: shorter outputs and more redirects
Main practical complaint: more conversation overhead for less useful finished work

The comments are what made the thread feel bigger than one user's dataset. The top reply reduced the whole thing to a blunt "enshittification," which captured the mood even if it explained nothing. Other commenters said they had also seen more "we're done here" style responses when tasks were clearly unfinished. Some speculated about compute being redirected elsewhere. Others argued the real question is not single-turn response quality but whether the model still behaves consistently across multi-step workflows with accumulated context and strict formatting requirements. The original poster answered that their counts were based on sustained multi-turn sessions rather than one-off prompts, which is why several readers took the claims seriously even without a formal paper behind them.

The important caution is that this remains community-supplied evidence, not a verified Anthropic disclosure. Still, threads like this matter because production users rarely wait for polished benchmark updates before noticing a regression. They notice when tasks take longer, when outputs shrink, or when the model starts deflecting instead of finishing the job. Reddit pushed this post because it turned that fuzzy irritation into a set of numbers the rest of the community could debate.

Reddit Tries to Put Numbers on the Feeling That Claude Got More Cautious

Related Articles

Opus 4.7’s Reddit benchmark fight was really about refusals versus regression

HN Looks Past the Claude Opus 4.7 Headline to Adaptive Thinking, Tokens, and Trust

Anthropic Economic Index says experienced Claude users iterate more and rely less on full autonomy

Comments (0)

Leave a Comment