A Hacker News discussion highlights arXiv:2602.11988, which finds that repository context files like AGENTS.md often reduced coding-agent task success while increasing inference cost by more than 20%.
LLM
RSS FeedOpenAI announced GPT-5.3 Codex Spark on February 12, 2026, positioning it as a coding-focused model optimized for practical throughput and cost efficiency. The company reports lower latency and token cost versus GPT-5.2 while maintaining strong benchmark results.
On February 12, 2026, OpenAI introduced ChatGPT deep research, an agentic workflow designed to run multi-step web investigations and produce citation-backed reports. The release targets higher-value knowledge work where traceable evidence matters as much as speed.
A r/LocalLLaMA post on Qwen3.5 gained 123 upvotes and pointed directly to public weights and model documentation. The linked card confirms key specs including 397B total parameters, 17B activated, and 262,144 native context length.
A Docker guide on running NanoClaw inside a Shell Sandbox reached 102 points on Hacker News, highlighting a practical pattern for isolating agent runtime, limiting filesystem exposure, and keeping API keys out of the guest environment.
A high-ranking r/singularity post shared Google’s Gemini 3 Deep Think update. The announcement includes benchmark claims such as 48.4% on Humanity’s Last Exam (without tools), 84.6% on ARC-AGI-2, and Codeforces Elo 3455, plus Gemini API early access.
A Hacker News post highlighted the SkillsBench paper, which evaluates agent skills across 86 tasks and 11 domains. Curated skills improved average pass rate substantially, while self-generated skills showed no average gain.
Google DeepMind announced Gemma Scope 2, extending open interpretability tooling to the full Gemma 3 family from 270M to 27B parameters. The company says the release involved roughly 110 Petabytes of stored data and over 1 trillion total trained parameters.
A high-engagement r/LocalLLaMA thread tracked the MiniMax-M2.5 release on Hugging Face. The model card emphasizes agentic coding/search benchmarks, runtime speedups, and aggressive cost positioning.
A Show HN post introduces Off Grid, an open-source Android/iOS app that runs chat, image generation, vision, and speech transcription entirely on-device without cloud data transfer.
OpenAI reports that, across more than one million ChatGPT conversations, the share of difficult interactions exceeding a human baseline increased roughly fourfold from September 2024 to January 2026. The company also shows large gains in case-interview and puzzle-style open tasks.
A Reddit thread amplified an Ars Technica report that Google detected a 100,000+ prompt extraction campaign against Gemini, reopening questions about distillation, defense, and IP boundaries.