Hacker News debates Epoch’s FrontierMath solve confirmation for GPT-5.4 Pro

Hacker News turned Epoch AI’s FrontierMath update into a major discussion on March 24, 2026, lifting the post to 322 points and 318 comments. The source page says Kevin Barreto and Liam Price first elicited a solution to a Ramsey-style hypergraph problem with GPT-5.4 Pro, and problem contributor Will Brian confirmed that the argument works and will be written up for publication.

That confirmation is the important part. The problem is not a marketing demo prompt but a FrontierMath Open Problems combinatorics challenge: construct hypergraphs as large as possible without a certain partition property. Epoch says the AI-assisted solution removed an inefficiency in the previous lower-bound construction and “mirrors” part of the upper-bound argument, which is why Brian described the result as both interesting and mathematically meaningful.

Epoch published links to the original transcript and to GPT-5.4 Pro’s final write-up.
Barreto and Price may be coauthors on any resulting paper, according to the update.
After Epoch finished its newer evaluation scaffold, Opus 4.6 (max), Gemini 3.1 Pro, and GPT-5.4 (xhigh) also solved the same problem.

Those extra solves add nuance to the HN thread. The conversation is not just about whether one model got there first; it is about what counts as a solve, how much scaffolding matters, and whether benchmark progress is starting to cross into expert-verifiable research work. The fact that the contributor confirmed the proof direction matters more than a raw leaderboard number.

For Insights readers, this post is a sign that advanced math benchmarking is moving from scorekeeping toward workflows that include transcripts, expert review, and eventual publication. Original source: Epoch AI. Community discussion: Hacker News.

Sciences Mar 8, 2026 2 min read

Google DeepMind says Gemini Deep Think is moving from Olympiad benchmarks into math, physics, and CS research

Google DeepMind said on February 11, 2026 that Gemini Deep Think is now helping tackle professional problems in mathematics, physics, and computer science under expert supervision. The company tied the claim to two fresh papers, a research agent called Aletheia, and examples ranging from autonomous math results to work on algorithms, optimization, economics, and cosmic-string physics.

#[#"#g

Sciences Reddit Mar 11, 2026 2 min read

r/singularity Spots a Real Math Result in Claude Opus 4.6

A high-scoring r/singularity post pointed readers to Donald Knuth’s note <em>Claude’s Cycles</em>, where he says Claude Opus 4.6 helped solve an open combinatorics problem that arose while he was preparing a future TAOCP volume.

#[#"#a

Sciences 5d ago 2 min read

Google uses Groundsource to expand urban flash-flood forecasting with 2.6 million events

Google on Mar 12, 2026 introduced Groundsource, a Gemini-powered method for turning public reports into historical disaster data. The company says the system identified more than 2.6 million flood events across over 150 countries and now supports urban flash-flood forecasts up to 24 hours in advance.

#[#"#g

Hacker News debates Epoch’s FrontierMath solve confirmation for GPT-5.4 Pro

Related Articles

Google DeepMind says Gemini Deep Think is moving from Olympiad benchmarks into math, physics, and CS research

r/singularity Spots a Real Math Result in Claude Opus 4.6

Google uses Groundsource to expand urban flash-flood forecasting with 2.6 million events

Comments (0)

Leave a Comment