Reddit Highlights Gemini 3 Deep Think Upgrade for Science and Engineering
Original: Google upgraded Gemini-3 DeepThink: Advancing science, research and engineering View original →
How the update spread in community channels
A popular r/singularity thread amplified Google’s new Gemini 3 Deep Think announcement. At collection time, the post had a score of 675 and 51 comments. The linked primary source is Google’s official article, Gemini 3 Deep Think: Advancing science, research and engineering, which frames Deep Think as a specialized reasoning mode for difficult research and engineering workflows.
Numbers Google published
Google reports several headline metrics for the updated mode: 48.4% on Humanity’s Last Exam without tools, 84.6% on ARC-AGI-2 (noted as verified by the ARC Prize Foundation), and a Codeforces Elo of 3455. The post also states gold-medal-level performance on the 2025 International Math Olympiad. Beyond math and coding, Google adds claims for science-heavy evaluation settings, including gold-medal-level written performance for the 2025 Physics and Chemistry Olympiads and 50.5% on CMT-Benchmark.
Availability and deployment signal
According to the announcement, the upgraded Deep Think is available in the Gemini app for Google AI Ultra subscribers. Google also opened an early-access path through the Gemini API for researchers, engineers, and enterprises. The write-up includes applied examples: identifying a subtle logical flaw in a technical math paper at Rutgers and helping optimize a crystal-growth recipe at Duke University’s Wang Lab for thin films larger than 100 μm.
Why this Reddit post mattered
The thread gained traction because it combined benchmark-oriented claims with a concrete API access path. That pairing matters for teams evaluating whether frontier reasoning models can move from demo-level tasks into reproducible science and engineering pipelines. In practice, production value will depend less on headline scores alone and more on domain-specific validation, failure analysis, and tool integration quality. Still, this release is a clear signal that specialized reasoning modes are being positioned as operational infrastructure rather than one-off showcase features.
Related Articles
Why it matters: an open-weight 27B dense model is now being pitched against much larger coding systems on real agent tasks. Qwen’s own model card lists SWE-bench Verified at 77.2 for Qwen3.6-27B versus 76.2 for Qwen3.5-397B-A17B, with Apache 2.0 licensing.
Why it matters: this is one of the first external benchmark reads to land right after the GPT-5.5 launch. Artificial Analysis said GPT-5.5 moved 3 points clear on its Intelligence Index, while the full index run still became roughly 20% more expensive.
Why it matters: inference cost is now a product constraint, not only an infrastructure problem. Cohere said its W4A8 path in vLLM is up to 58% faster on TTFT and 45% faster on TPOT versus W4A16 on Hopper.
Comments (0)
No comments yet. Be the first to comment!