LLM Hacker News 3h ago 1 min read
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.