Hacker News debates Epoch’s FrontierMath solve confirmation for GPT-5.4 Pro
Original: Epoch confirms GPT5.4 Pro solved a frontier math open problem View original →
Hacker News turned Epoch AI’s FrontierMath update into a major discussion on March 24, 2026, lifting the post to 322 points and 318 comments. The source page says Kevin Barreto and Liam Price first elicited a solution to a Ramsey-style hypergraph problem with GPT-5.4 Pro, and problem contributor Will Brian confirmed that the argument works and will be written up for publication.
That confirmation is the important part. The problem is not a marketing demo prompt but a FrontierMath Open Problems combinatorics challenge: construct hypergraphs as large as possible without a certain partition property. Epoch says the AI-assisted solution removed an inefficiency in the previous lower-bound construction and “mirrors” part of the upper-bound argument, which is why Brian described the result as both interesting and mathematically meaningful.
- Epoch published links to the original transcript and to GPT-5.4 Pro’s final write-up.
- Barreto and Price may be coauthors on any resulting paper, according to the update.
- After Epoch finished its newer evaluation scaffold, Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh) also solved the same problem.
Those extra solves add nuance to the HN thread. The conversation is not just about whether one model got there first; it is about what counts as a solve, how much scaffolding matters, and whether benchmark progress is starting to cross into expert-verifiable research work. The fact that the contributor confirmed the proof direction matters more than a raw leaderboard number.
For Insights readers, this post is a sign that advanced math benchmarking is moving from scorekeeping toward workflows that include transcripts, expert review, and eventual publication. Original source: Epoch AI. Community discussion: Hacker News.
Related Articles
The subreddit jumped straight past the headline and into the hard question: was this finally something other than pattern replay? A Scientific American report on a 23-year-old using GPT-5.4 Pro on a 60-year-old Erdos problem sparked debate over novelty, expert cleanup, and whether messy model output can still contain a real mathematical idea.
HN read this math story less as another "AI did it" headline and more as a case where a model pointed at a route humans had not tried. The part that stuck was the expert cleanup work after the GPT-5.4 Pro draft, not the one-shot prompt itself.
Google DeepMind unveiled an AI Co-Mathematician system — a multi-agent Gemini-based framework scoring 48% on FrontierMath Tier 4, the highest ever for any AI. AlphaEvolve improved lower bounds on five Ramsey numbers, including R(3,13) whose previous record had stood for 11 years.