Gemini 3 Deep Think Expands From Benchmarks to Science and Engineering Workflows
Original: Gemini 3 Deep Think View original →
What Google announced
On February 12, 2026, Google published a major update to Gemini 3 Deep Think, its specialized reasoning mode. The company positions this release as a model update aimed at harder science, research, and engineering tasks rather than only consumer chatbot use. According to the post, the rollout starts in the Gemini app for Google AI Ultra subscribers, and selected organizations can request early access through the Gemini API program.
Claimed benchmark gains
Google reported several headline numbers for the updated Deep Think mode: 48.4% on Humanity’s Last Exam (without tools), 84.6% on ARC-AGI-2 (stated as verified by the ARC Prize Foundation), Codeforces Elo 3455, and gold-medal-level performance on International Math Olympiad 2025 tasks. The article also claims stronger science performance, including gold-medal-level results on written sections of the 2025 International Physics Olympiad and Chemistry Olympiad, plus 50.5% on CMT-Benchmark for theoretical physics.
Early tester examples
The post includes concrete pilot stories: Rutgers mathematician Lisa Carbone reportedly used Deep Think to identify a subtle logical flaw in a technical math paper; Duke University’s Wang Lab used it to design crystal-growth recipes and reached thin-film targets above 100 μm; and a Google hardware R&D lead tested the system for physical component design tasks. These are all vendor-reported examples, but they show where Google is trying to position Deep Think: mixed scientific reasoning plus practical engineering output.
Why the HN community reacted strongly
The Hacker News thread for this item reached more than one thousand points and hundreds of comments at crawl time, signaling strong interest in reproducibility, benchmark validity, and access controls for advanced reasoning models. The core takeaway is that Google is now combining benchmark signaling with workflow distribution through app and API channels, which is likely to shape competition around enterprise and research adoption in 2026.
Related Articles
Google AI Developers announced that Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API and Google AI Studio. The post positions it as the fastest and most cost-efficient model in the Gemini 3 line, now adding dynamic thinking for task-adaptive reasoning.
Why it matters: this is one of the first external benchmark reads to land right after the GPT-5.5 launch. Artificial Analysis said GPT-5.5 moved 3 points clear on its Intelligence Index, while the full index run still became roughly 20% more expensive.
Sakana AI is trying to sell orchestration itself as a model product, not just a prompt hack around other APIs. In its beta table, fugu-ultra posts 54.2 on SWEPro and 95.1 on GPQAD while shipping behind an OpenAI-compatible API.
Comments (0)
No comments yet. Be the first to comment!