Sakana Fugu Opens Beta With 54.2 SWE-Pro and OpenAI-Style API
Original: We’re launching the beta for our new commercial AI product: Sakana Fugu, a multi-agent orchestration system! View original →
Sakana AI is moving multi-agent orchestration out of the lab demo phase and into a commercial API, which matters because most teams still wire model routing together by hand. In a new X post, the Tokyo lab says Sakana Fugu is entering beta as a system that can choose and coordinate frontier models automatically instead of forcing developers to manage separate providers, API keys, and brittle prompt logic.
“We’re launching the beta for our new commercial AI product: Sakana Fugu, a multi-agent orchestration system,” the team wrote on X, adding that Fugu hit SOTA on SWE-Pro, GPQA-D, and ALE-Bench.
The linked official blog post provides the harder numbers. Sakana says fugu-ultra reaches 95.1 on GPQAD, 93.2 on LCBv6, and 54.2 on SWEPro. In the same table, Gemini 3.1 high scores 94.4 on GPQAD and GPT 5.4 high scores 51.2 on SWEPro, while Anthropic’s cited Opus 4.6 max score on SWEPro is 53.4. Sakana is also pitching the product as easy to slot into existing stacks: the beta uses OpenAI-format endpoints and comes in two modes, fugu-mini for lower latency and fugu-ultra for heavier reasoning work.
The Sakana AI account usually uses X to turn its research agenda into concrete product or benchmark claims, and this post fits that pattern. The company has spent the last year arguing that the most capable systems will be coordinated collections of models rather than one giant endpoint. The Fugu release ties the product directly to two ICLR 2026 papers, Trinity and Conductor, which frame orchestration itself as something a small controller model can learn. One notable detail from the blog: Sakana says Fugu can recursively call itself, turning orchestration depth into a test-time compute dial instead of a fixed workflow.
What to watch next is whether outside beta users can reproduce the benchmark edge and whether Sakana discloses pricing, model-pool composition, and failure cases as the test expands. If those scores hold up in real coding and scientific workflows, Fugu becomes more than another wrapper on frontier APIs. It becomes a live test of whether orchestration can be sold as a model category of its own.
Related Articles
OpenAI announced GPT-5 on 2025-08-07 for both ChatGPT and API usage. The launch highlights include a reported 45% hallucination reduction vs GPT-4o and major benchmark gains such as HealthBench Hard 44.6.
xAI is turning voice agents into production software, not a demo. Grok Voice Think Fast 1.0 tops τ-voice Bench, supports 25+ languages, and xAI says the same stack is driving a 20% sales conversion and 70% support resolution flow at Starlink.
OpenAI launched GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — new voice API models covering live reasoning, real-time translation across 70+ languages, and streaming transcription. The Realtime API is now generally available for production use.