Startup Taalas proposes baking entire LLM weights and architecture into custom ASICs, claiming 17K+ tokens/second per user, sub-1ms latency, and 20x lower cost than cloud — all achievable within a 60-day chip production cycle.
LLM
RSS FeedOllama 0.17, released February 22, introduces a new native inference engine replacing llama.cpp server mode, delivering up to 40% faster prompt processing and 18% faster token generation on NVIDIA GPUs, plus improved multi-GPU tensor parallelism and AMD RDNA 4 support.
Claude Sonnet 4.6, released February 17, delivers dramatically improved coding and computer use (72.5% on OSWorld—a nearly fivefold improvement) with a 1M token context window in beta, at unchanged pricing from Sonnet 4.5.
Alibaba launched Qwen3.5, a 397B-parameter open-weight multimodal model supporting 201 languages. The company claims it outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 on benchmarks, while costing 60% less than its predecessor.
Google's Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2—more than doubling the previous Gemini 3 Pro's score. The mid-cycle upgrade brings Deep Think-level reasoning capabilities to all users and developers.
The Qwen research team has officially confirmed through a published paper that GPQA and HLE (Humanity's Last Exam) benchmark datasets contain serious quality issues — including OCR errors, incorrect gold-standard answers, and unverifiable questions — casting doubt on the reliability of current AI model evaluations.
A user created a fully playable space exploration game using only natural language instructions to Gemini 3.1 Pro over a few hours. The AI handled performance optimization, soundtrack generation, and UI design entirely from plain language requests, producing around 1,800 lines of HTML code.
Anthropic's Claude Sonnet 4.6, released February 17, delivers Opus 4.5-level performance at Sonnet pricing with a 1M-token context window in beta, and becomes the new default for Free and Pro users.
Google DeepMind has released Gemini 3.1 Pro with over 2x reasoning performance versus Gemini 3 Pro. The model scores 77.1% on ARC-AGI-2 (up from 31.1%), 80.6% on SWE-bench Verified, and tops 12 of 18 tracked benchmarks at unchanged $2/$12 per million token pricing.
Taalas has released an ASIC chip that physically etches Llama 3.1 8B model weights into silicon, achieving 17,000 tokens per second—10x faster, 10x cheaper, and 10x more power-efficient than GPU-based inference systems.
At the India AI Summit on February 17, Cohere released Tiny Aya, a family of 3.35B open-weight multilingual models supporting 70+ languages that run offline on standard laptops, targeting global language accessibility.
ByteDance released Doubao 2.0 ahead of Lunar New Year, claiming GPT-5.2 and Gemini 3 Pro parity with 98.3 on AIME 2025, a 3020 Codeforces rating, and pricing 10x cheaper than Western rivals.