LLM

LLM Hacker News Feb 18, 2026 2 min read

Claude Sonnet 4.6 launched: 1M context, same pricing, stronger real-world automation

Anthropic introduced Claude Sonnet 4.6 with a 1M token context window (beta), stronger coding/computer-use performance, and unchanged API pricing at $3/$15 per million tokens.

#anthropic #claude #sonnet

LLM Feb 17, 2026 2 min read

Anthropic introduces Claude Sonnet 4.6 with 1M token context beta while holding API pricing flat

Anthropic announced Claude Sonnet 4.6 on February 17, 2026, positioning it as a full upgrade across coding, computer use, and long-context reasoning. The model becomes default for Free/Pro users and keeps Sonnet 4.5 API pricing at $3/$15 per million tokens.

#anthropic #claude #sonnet-4-6

LLM Reddit Feb 17, 2026 1 min read

Reddit Signals Strong Developer Interest in Qwen3.5-397B-A17B Release

A high-scoring r/LocalLLaMA thread surfaced Qwen3.5-397B-A17B, an open-weight multimodal model card on Hugging Face that lists 397B total parameters with 17B activated and up to about 1M-token extended context.

#qwen3.5 #open-weights #multimodal

LLM Hacker News Feb 17, 2026 1 min read

HN Spotlight: New arXiv Study Questions Whether AGENTS.md Helps Coding Agents

A Hacker News discussion highlights arXiv:2602.11988, which finds that repository context files like AGENTS.md often reduced coding-agent task success while increasing inference cost by more than 20%.

#coding-agents #agents-md #swe-bench

LLM Feb 17, 2026 1 min read

OpenAI Introduces GPT-5.3 Codex Spark With a Lower-Latency, Lower-Cost Coding Profile

OpenAI announced GPT-5.3 Codex Spark on February 12, 2026, positioning it as a coding-focused model optimized for practical throughput and cost efficiency. The company reports lower latency and token cost versus GPT-5.2 while maintaining strong benchmark results.

#openai #gpt-5.3 #codex

LLM Feb 17, 2026 1 min read

OpenAI Launches ChatGPT Deep Research for Multi-Step Web Analysis

On February 12, 2026, OpenAI introduced ChatGPT deep research, an agentic workflow designed to run multi-step web investigations and produce citation-backed reports. The release targets higher-value knowledge work where traceable evidence matters as much as speed.

#openai #deep-research #chatgpt

LLM Reddit Feb 17, 2026 2 min read

Reddit Tracks Qwen3.5 Open-Weight Release with 397B-A17B Model Card Details

A r/LocalLLaMA post on Qwen3.5 gained 123 upvotes and pointed directly to public weights and model documentation. The linked card confirms key specs including 397B total parameters, 17B activated, and 262,144 native context length.

#qwen #open-weight #multimodal

LLM Hacker News Feb 17, 2026 2 min read

Hacker News Spotlights Docker Shell Sandboxes for Safer NanoClaw Agent Deployments

A Docker guide on running NanoClaw inside a Shell Sandbox reached 102 points on Hacker News, highlighting a practical pattern for isolating agent runtime, limiting filesystem exposure, and keeping API keys out of the guest environment.

#docker #sandboxing #ai-agents

LLM Reddit Feb 17, 2026 1 min read

Reddit Highlights Gemini 3 Deep Think Upgrade for Science and Engineering

A high-ranking r/singularity post shared Google’s Gemini 3 Deep Think update. The announcement includes benchmark claims such as 48.4% on Humanity’s Last Exam (without tools), 84.6% on ARC-AGI-2, and Codeforces Elo 3455, plus Gemini API early access.

#gemini #reasoning-models #benchmarks

LLM Hacker News Feb 17, 2026 1 min read

SkillsBench Finds Self-Generated Agent Skills Add No Average Benefit

A Hacker News post highlighted the SkillsBench paper, which evaluates agent skills across 86 tasks and 11 domains. Curated skills improved average pass rate substantially, while self-generated skills showed no average gain.

#llm-agents #benchmark #evaluation

LLM Feb 16, 2026 1 min read

Google DeepMind Releases Gemma Scope 2 Across Gemma 3 Models for Open Interpretability Research

Google DeepMind announced Gemma Scope 2, extending open interpretability tooling to the full Gemma 3 family from 270M to 27B parameters. The company says the release involved roughly 110 Petabytes of stored data and over 1 trillion total trained parameters.

#gemma #interpretability #ai-safety

LLM Reddit Feb 16, 2026 2 min read

LocalLLaMA Spotlights MiniMax-M2.5 as Hugging Face Release Gains Traction

A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.

#minimax #llm #agents