In an April 4 X post, GitHub put fresh attention on Agentic Workflows, a technical-preview system that lets teams describe repository chores in Markdown and run them in GitHub Actions with coding agents. The underlying documentation says workflows default to read-only access and rely on reviewable safe outputs for write actions such as opening pull requests or posting issue comments.
LLM
RSS FeedA LocalLLaMA demo pointed to Parlor, which runs speech and vision understanding with Gemma 4 E2B and uses Kokoro for text-to-speech, all on-device. The README reports roughly 2.5-3.0 seconds end-to-end latency and about 83 tokens/sec decode speed on an Apple M3 Pro.
A LocalLLaMA explainer argues that Gemma 4 E2B/E4B gain their efficiency from Per-Layer Embeddings. The key point is that many of those parameters behave more like large token lookup tables than always-active compute-heavy layers, which changes the inference trade-off.
A Show HN thread highlighted GuppyLM, a tiny 8.7M-parameter transformer with a 60K synthetic conversation dataset and Colab notebooks. The point is not state-of-the-art performance, but making the full LLM pipeline inspectable from data generation to inference.
Bankai, highlighted in LocalLLaMA, proposes post-training adaptation for true 1-bit LLMs by applying sparse XOR patches directly to binary weights. According to the GitHub repo and paper, patches around 1 KB changed Bonsai 8B behavior with zero inference overhead, fixed 4 of 17 held-out failures without breaking 13 already-correct cases, and could be applied or reverted with the same XOR operation in microseconds.
Andrej Karpathy's April 4, 2026 "LLM Wiki" gist proposes replacing one-shot retrieval with an interlinked wiki that an agent continuously maintains. Hacker News focused on the three-layer design of raw sources, wiki, and schema, plus the ingest, query, and lint loop that lets knowledge compound instead of being rediscovered from scratch for every prompt.
Sebastian Raschka's April 4, 2026 article argues that coding-agent quality is shaped as much by the harness as by the base model. He breaks the stack into six components: live repo context, prompt and cache reuse, structured tools, context reduction, session memory, and bounded subagents. Hacker News treated it as a practical framework for understanding why products like Codex and Claude Code feel stronger than plain chat.
Together Research says LLMs can patch faulty database query plans instead of regenerating them from scratch, and claims up to 4.78x speedups on some TPC-H and TPC-DS workloads. The tweet points to DBPlanBench, a DataFusion-based harness that exposes a physical operator graph to an LLM and uses iterative search to refine plan edits.
Cursor has published a technical report for Composer 2, outlining a two-stage recipe of continued pretraining and large-scale reinforcement learning for agentic software engineering. The company says the model reaches 61.3 on CursorBench, 61.7 on Terminal-Bench, and 73.7 on SWE-bench Multilingual while keeping pricing at $0.50/M input and $2.50/M output tokens.
A LocalLLaMA user compared Gemma 4 31B, Gemma 4 26B-A4B, and Qwen 3.5 27B across 30 blind prompts judged by Claude Opus 4.6. The result is not one clear winner but a more useful trade-off story around reliability, verbosity, and category-specific strengths.
A fresh LocalLLaMA thread argues that some early Gemma 4 failures are really inference-stack bugs rather than model quality problems. By linking active llama.cpp pull requests and user reports after updates, the post reframes launch benchmarks as a full-stack issue.
The GitHub project Caveman claims it can cut output tokens by about 75% by stripping filler language while preserving code and technical terms. On Hacker News, developers are treating it as a serious experiment in reducing agent cost, latency, and verbosity.