AI Hacker News 3h ago 2 min read
HN liked the premise of a fresh benchmark, then immediately started arguing about whether single-shot scoring tells the truth about coding models.
HN liked the premise of a fresh benchmark, then immediately started arguing about whether single-shot scoring tells the truth about coding models.