Anthropic introduced Claude Sonnet 4.6 with a 1M token context window (beta), stronger coding/computer-use performance, and unchanged API pricing at $3/$15 per million tokens.
LLM
RSS FeedAnthropic announced Claude Sonnet 4.6 on February 17, 2026, positioning it as a full upgrade across coding, computer use, and long-context reasoning. The model becomes default for Free/Pro users and keeps Sonnet 4.5 API pricing at $3/$15 per million tokens.
A high-scoring r/LocalLLaMA thread surfaced Qwen3.5-397B-A17B, an open-weight multimodal model card on Hugging Face that lists 397B total parameters with 17B activated and up to about 1M-token extended context.
A Hacker News discussion highlights arXiv:2602.11988, which finds that repository context files like AGENTS.md often reduced coding-agent task success while increasing inference cost by more than 20%.
OpenAI announced GPT-5.3 Codex Spark on February 12, 2026, positioning it as a coding-focused model optimized for practical throughput and cost efficiency. The company reports lower latency and token cost versus GPT-5.2 while maintaining strong benchmark results.
On February 12, 2026, OpenAI introduced ChatGPT deep research, an agentic workflow designed to run multi-step web investigations and produce citation-backed reports. The release targets higher-value knowledge work where traceable evidence matters as much as speed.
A r/LocalLLaMA post on Qwen3.5 gained 123 upvotes and pointed directly to public weights and model documentation. The linked card confirms key specs including 397B total parameters, 17B activated, and 262,144 native context length.
A Docker guide on running NanoClaw inside a Shell Sandbox reached 102 points on Hacker News, highlighting a practical pattern for isolating agent runtime, limiting filesystem exposure, and keeping API keys out of the guest environment.
A high-ranking r/singularity post shared Google’s Gemini 3 Deep Think update. The announcement includes benchmark claims such as 48.4% on Humanity’s Last Exam (without tools), 84.6% on ARC-AGI-2, and Codeforces Elo 3455, plus Gemini API early access.
A Hacker News post highlighted the SkillsBench paper, which evaluates agent skills across 86 tasks and 11 domains. Curated skills improved average pass rate substantially, while self-generated skills showed no average gain.
Google DeepMind announced Gemma Scope 2, extending open interpretability tooling to the full Gemma 3 family from 270M to 27B parameters. The company says the release involved roughly 110 Petabytes of stored data and over 1 trillion total trained parameters.
A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.