Google’s I/O 2026 AI story is about distribution as much as models. Gemini 3.5 Flash is now generally available across API, Antigravity, Android Studio, enterprise tools, Search, and the Gemini app, while Gemini Omni Flash brings video generation into the same push.
Cognition is arguing that coding agents do not have to collapse into model-lab features. It raised more than $1B at a $26B valuation, with Devin’s run-rate revenue reaching $492M.
The Claude story is no longer only about model quality. Anthropic says its Series H raised $65B at a $965B post-money valuation, while run-rate revenue crossed $47B earlier in May.
Innovent Biologics (01801.HK) rose about 11% after Pfizer $PFE agreed to a global oncology collaboration worth up to $10.5B. The package includes a $650M upfront payment and up to $9.85B in milestone payments.
International Flavors & Fragrances $IFF agreed to sell its Food Ingredients business to CVC for about $4.3B, above the $500M Tier-1 M&A threshold. The unit generated nearly $3.1B of 2025 sales and about $430M of EBITDA.
Samsung Electronics (005930.KS) rose as much as 6.5% after shipping 12-layer HBM4E samples to major global customers. The company says the chip reaches up to 16Gbps and targets next-generation AI workloads.
Gap $GAP fell 14% after Q1 net sales of $3.5B missed consensus and management narrowed FY2026 net sales growth to 1%-2%. Adjusted EPS was $0.38, while Old Navy and Athleta kept pressure on the top-line narrative.
Dell $DELL rose 39% after fiscal Q1 revenue reached $43.8B, up 88% year over year, with AI server revenue at $16.1B. The company lifted FY2027 guidance to $165B-$169B in revenue and $17.90 adjusted EPS.
Claude Opus 4.8 now has a fast mode that runs the same model at roughly 2.5x speed. Claude says the mode is three times cheaper than before, shifting the cost equation for long agent sessions.
Claude Opus 4.8 is showing its strongest early signal in agentic work, not only coding. Artificial Analysis says the model scored 1890 on GDPval-AA, 121 points ahead of GPT-5.5 xhigh.
LocalLLaMA readers noticed the infrastructure lesson: Zai claimed 15% more GPU inference throughput and 40.6% lower first-token P99 latency with the same GPUs, model, and software stack.
LocalLLaMA readers quickly turned the story into an operator checklist: check Starlette, FastAPI, vLLM, LiteLLM, MCP servers, and anything exposed to the Internet.