LLM Hacker News 5h ago 1 min read
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.
HN latched onto a practical shift in coding evals: correctness is no longer enough if the patch would fail human review.