xAI is turning Grok Build from a CLI-backed experience into a public API beta. The headline number is pricing: $1 per million input tokens and $2 per million output tokens for agentic coding workloads.
xAI is turning Grok Build from a CLI-backed experience into a public API beta. The headline number is pricing: $1 per million input tokens and $2 per million output tokens for agentic coding workloads.
xAI has released an early beta of Grok Build, an agentic CLI tool for coding, building apps, and automating workflows, available now to SuperGrok Heavy subscribers. The announcement drew over 41 million views, signaling massive developer interest.
Open-weight coding models that can run locally are still scarce. Poolside has pushed Laguna XS.2 into that lane with a 33B total / 3B active MoE that fits a single GPU, and its technical note claims 44.5% on SWE-bench Pro.
HN latched onto the money leak before the bug itself. A report that Claude Managed Agents append a malware reminder to every file read, then sometimes refuse to edit code anyway, turned into a broader argument about opaque token spend and whether agent harnesses deserve more scrutiny.
GitHub is no longer talking about routine uptime tuning. In its April 28 update, the company said a 10x capacity plan launched in October 2025 had to be reworked for 30x scale by February 2026, after recent incidents hit 230 repositories and 2,092 pull requests.
Hacker News did not focus on the headline that plan prices stay flat. The thread zeroed in on a simpler point: on April 27, 2026, GitHub admitted that long agentic coding sessions cannot be subsidized forever, and predictable Copilot costs are giving way to token math.
OpenAI is pushing harder into agentic work, not just chat. On the company's own evals, GPT-5.5 reaches 82.7% on Terminal-Bench 2.0, beats GPT-5.4 by 7.6 points, and uses fewer tokens in Codex.
OpenAI is pitching GPT-5.5 as more than a routine model refresh. With 82.7% on Terminal-Bench 2.0, 58.6% on SWE-Bench Pro, and a claim that it keeps GPT-5.4-level latency, the company is resetting expectations for long-running coding agents.
A large Hacker News thread turned a Claude Code quota complaint into a deeper argument about how prompt caching, background sessions, and auto-compacts behave inside 1M-context agent workflows. The GitHub issue author published April 9, 2026 usage logs, and the discussion quickly shifted from “limits feel worse” to cache accounting and quota transparency.
Hacker News picked up Z.ai's GLM-5.1 as a model aimed less at one-shot wins and more at sustained agentic work. Z.ai reports 58.4 on SWE-Bench Pro, 42.7 on NL2Repo, 66.5 on Terminal Bench 2.0, and long-horizon runs that keep improving through hundreds of iterations and thousands of tool calls.
Cursor has published a technical report for Composer 2, outlining a two-stage recipe of continued pretraining and large-scale reinforcement learning for agentic software engineering. The company says the model reaches 61.3 on CursorBench, 61.7 on Terminal-Bench, and 73.7 on SWE-bench Multilingual while keeping pricing at $0.50/M input and $2.50/M output tokens.
Anthropic said on March 30, 2026 that computer use is now available in Claude Code in research preview for Pro and Max plans. Claude Code docs say the feature lets Claude open apps, click through UI flows, and see the screen on macOS from the CLI, targeting native app testing, visual debugging, and other GUI-only tasks.