A user created a fully playable space exploration game using only natural language instructions to Gemini 3.1 Pro over a few hours. The AI handled performance optimization, soundtrack generation, and UI design entirely from plain language requests, producing around 1,800 lines of HTML code.
LLM
Anthropic's Claude Sonnet 4.6, released February 17, delivers Opus 4.5-level performance at Sonnet pricing with a 1M-token context window in beta, and becomes the new default for Free and Pro users.
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.
Taalas has released an ASIC chip that physically etches Llama 3.1 8B model weights into silicon, achieving 17,000 tokens per second—10x faster, 10x cheaper, and 10x more power-efficient than GPU-based inference systems.
At the India AI Summit on February 17, Cohere released Tiny Aya, a family of 3.35B open-weight multilingual models supporting 70+ languages that run offline on standard laptops, targeting global language accessibility.
ByteDance released Doubao 2.0 ahead of Lunar New Year, claiming GPT-5.2 and Gemini 3 Pro parity with 98.3 on AIME 2025, a 3020 Codeforces rating, and pricing 10x cheaper than Western rivals.
Claude Opus 4.6 achieved a 50%-time-horizon of approximately 14.5 hours on METR's software task benchmark — beating all predictions and suggesting a doubling time of under 3 months for AI task capabilities.
A new open-source project called ntransformer enables running the 140GB Llama 3.1 70B model on a single consumer RTX 3090 by streaming weights directly from NVMe storage to GPU, completely bypassing CPU RAM.
Andrej Karpathy coined a new term for OpenClaw-like AI agent systems: "Claws." Just as LLM agents were a new layer on top of LLMs, Claws provide orchestration, scheduling, persistent context, and tool calls on top of LLM agents.
Claude Code has grown to over $2.5 billion in annualized run-rate revenue as of February 2026, more than doubling since its first six months. The AI coding agent now accounts for over half of all enterprise spending on Anthropic and users average 20 hours per week with the product.
xAI released Grok 4.20 as a public beta on February 17, introducing a continuous post-deployment learning architecture that updates the model weekly from user feedback. The release also adds a four-agent collaboration system and medical document analysis via photo upload.
Anthropic released Claude Code Security on February 20, a research preview that uses Claude Opus 4.6 to reason about codebases like a human security researcher, finding over 500 previously undetected vulnerabilities in production open-source projects. The launch sent cybersecurity stocks tumbling up to 9%.