#coding-agents

LLM Hacker News Apr 16, 2026 1 min read

HN Sees Qwen3.6-35B-A3B as a Small Active-Parameter Bet for Coding Agents

HN latched onto the open-weight angle: a 35B MoE model with only 3B active parameters is interesting if it can actually carry coding-agent work. Qwen says Qwen3.6-35B-A3B improves sharply over Qwen3.5-35B-A3B, while commenters immediately moved to GGUF builds, Mac memory limits, and whether open-model-only benchmark tables are enough context.

#qwen #open-weights #coding-agents

LLM Apr 15, 2026 2 min read

LiteCoder pushes terminal agents to 31.5% on Terminal Bench Pro

LiteCoder is making a case that smaller coding agents still have room to climb, releasing terminal-focused models plus 11,255 trajectories and 602 Harbor environments. Its 30B model reaches 31.5% Pass@1 on Terminal Bench Pro, up from 22.0% in the preview.

#litecoder #coding-agents #benchmarks

LLM Apr 14, 2026 2 min read

GitHub Brings Copilot Cloud Agent Research and Coding Workflows to Mobile

GitHub has expanded Copilot cloud agent on GitHub Mobile beyond pull request review. Developers can now ask the agent to research a codebase, draft an implementation plan, edit on a branch, review diffs, and open a pull request from a phone when ready.

#github #copilot #mobile

LLM Apr 13, 2026 1 min read

Google pairs Docs MCP and Developer Skills to keep Gemini coding agents current

Google says coding agents often produce stale Gemini API code because model training data has a cutoff date, and is shipping Docs MCP plus Developer Skills as the fix. Used together, Google reports a 96.3% pass rate with 63% fewer tokens per correct answer than vanilla prompting on its eval set.

#google #gemini-api #mcp

LLM Hacker News Apr 10, 2026 2 min read

Hacker News Zeroes In on Research-Driven Coding Agents

A Hacker News discussion focused on SkyPilot's argument that coding agents work better when they read papers and competing implementations before editing code. In the reported llama.cpp experiments, that research-first loop produced 5 viable optimizations and improved TinyLlama text generation by 15% on x86 and 5% on ARM for about $29.

#coding-agents #llama-cpp #skypilot

AI Hacker News Apr 7, 2026 2 min read

Launch HN Puts Freestyle’s Live-Forked Agent Sandboxes on the Map

A Launch HN post with around 260 points introduced Freestyle as infrastructure for coding agents, highlighting sub-second VM startup, live forking of running sandboxes, pause-and-resume persistence, built-in git hosting, and full Linux VMs intended for agent platforms rather than lightweight demo containers.

#coding-agents #sandbox #virtualization

LLM Hacker News Apr 6, 2026 2 min read

Hacker News Highlights a Six-Part Blueprint for Coding Agents

Sebastian Raschka's April 4, 2026 article argues that coding-agent quality is shaped as much by the harness as by the base model. He breaks the stack into six components: live repo context, prompt and cache reuse, structured tools, context reduction, session memory, and bounded subagents. Hacker News treated it as a practical framework for understanding why products like Codex and Claude Code feel stronger than plain chat.

#coding-agents #agent-harness #repo-context

LLM X/Twitter Apr 5, 2026 2 min read

Cursor details Composer 2’s training stack, from continued pretraining to real-world RL

Cursor said on March 26, 2026 that real-time reinforcement learning lets it ship improved Composer 2 checkpoints every five hours. Cursor’s March 27 technical report says the model combines continued pretraining on Kimi K2.5 with large-scale RL in realistic Cursor sessions, scores 61.3 on CursorBench, and runs on an asynchronous multi-region RL stack with large sandbox fleets.

#cursor #composer-2 #reinforcement-learning

101

AI Hacker News Apr 3, 2026 2 min read

Hacker News Pushes Cursor 3 as a Unified Workspace for Coding Agents

Cursor 3 reframes AI coding as multi-agent orchestration, combining local and cloud agents, multi-repo context, and PR-oriented workflows in a single interface.

#cursor #coding-agents #developer-tools

105

AI Reddit Mar 31, 2026 2 min read

r/singularity Debates Meta-Harness After Its TerminalBench 2 Lead Over Claude Code

A r/singularity post with 286 upvotes and 57 comments spotlighted Meta-Harness claiming a clear TerminalBench 2 lead over Claude Code. The discussion centered on what a harness is, whether AI-designed harnesses can beat manual iteration, and whether open models will get the same treatment.

#reddit #meta-harness #terminalbench-2

105

AI Reddit Mar 30, 2026 2 min read

r/singularity Highlights Cursor’s Five-Hour Real-Time RL Loop for Composer

A March 29 r/singularity thread amplified Cursor's claim that Composer checkpoints can now be trained from live user interactions and shipped every five hours, with reward-hacking fixes treated as part of the story rather than an afterthought.

#cursor #reinforcement-learning #coding-agents

LLM Hacker News Mar 28, 2026 2 min read

Hacker News spotlights ATLAS and the economics of local coding agents

A Hacker News post pushed ATLAS into the spotlight by framing a consumer-GPU coding agent as a serious cost challenger to hosted systems. The headline benchmark is interesting, but the repository itself makes clear that its 74.6% result is not a controlled head-to-head against Claude 4.5 Sonnet because the task counts and evaluation protocols differ.

#coding-agents #benchmarks #local-inference

106