Sakana Fugu Opens Beta With 54.2 SWE-Pro and OpenAI-Style API

Sakana AI is moving multi-agent orchestration out of the lab demo phase and into a commercial API, which matters because most teams still wire model routing together by hand. In a new X post, the Tokyo lab says Sakana Fugu is entering beta as a system that can choose and coordinate frontier models automatically instead of forcing developers to manage separate providers, API keys, and brittle prompt logic.

“We’re launching the beta for our new commercial AI product: Sakana Fugu, a multi-agent orchestration system,” the team wrote on X, adding that Fugu hit SOTA on SWE-Pro, GPQA-D, and ALE-Bench.

The linked official blog post provides the harder numbers. Sakana says fugu-ultra reaches 95.1 on GPQAD, 93.2 on LCBv6, and 54.2 on SWEPro. In the same table, Gemini 3.1 high scores 94.4 on GPQAD and GPT 5.4 high scores 51.2 on SWEPro, while Anthropic’s cited Opus 4.6 max score on SWEPro is 53.4. Sakana is also pitching the product as easy to slot into existing stacks: the beta uses OpenAI-format endpoints and comes in two modes, fugu-mini for lower latency and fugu-ultra for heavier reasoning work.

The Sakana AI account usually uses X to turn its research agenda into concrete product or benchmark claims, and this post fits that pattern. The company has spent the last year arguing that the most capable systems will be coordinated collections of models rather than one giant endpoint. The Fugu release ties the product directly to two ICLR 2026 papers, Trinity and Conductor, which frame orchestration itself as something a small controller model can learn. One notable detail from the blog: Sakana says Fugu can recursively call itself, turning orchestration depth into a test-time compute dial instead of a fixed workflow.

What to watch next is whether outside beta users can reproduce the benchmark edge and whether Sakana discloses pricing, model-pool composition, and failure cases as the test expands. If those scores hold up in real coding and scientific workflows, Fugu becomes more than another wrapper on frontier APIs. It becomes a live test of whether orchestration can be sold as a model category of its own.

Sakana Fugu Opens Beta With 54.2 SWE-Pro and OpenAI-Style API

Related Articles

IBM's VAKRA benchmark exposes where tool agents fail

LocalLLaMA Turns a 'Model Got Dumber' Complaint Into a Measurement Problem

Opus 4.7’s Reddit benchmark fight was really about refusals versus regression

Comments (0)

Leave a Comment

Related Articles

IBM's VAKRA benchmark exposes where tool agents fail
LLM Apr 17, 2026 2 min read

LocalLLaMA Turns a 'Model Got Dumber' Complaint Into a Measurement Problem
LLM Reddit Apr 17, 2026 2 min read

Opus 4.7’s Reddit benchmark fight was really about refusals versus regression