LLM

LLM X/Twitter Apr 6, 2026 2 min read

GitHub spotlights Markdown-defined Agentic Workflows for repository automation

In an April 4 X post, GitHub put fresh attention on Agentic Workflows, a technical-preview system that lets teams describe repository chores in Markdown and run them in GitHub Actions with coding agents. The underlying documentation says workflows default to read-only access and rely on reviewable safe outputs for write actions such as opening pull requests or posting issue comments.

#github #agentic-workflows #github-actions

LLM Reddit Apr 6, 2026 2 min read

Reddit showcases Parlor, a real-time local voice-and-vision assistant powered by Gemma 4 E2B

A LocalLLaMA demo pointed to Parlor, which runs speech and vision understanding with Gemma 4 E2B and uses Kokoro for text-to-speech, all on-device. The README reports roughly 2.5-3.0 seconds end-to-end latency and about 83 tokens/sec decode speed on an Apple M3 Pro.

#llm #multimodal #edge-ai

LLM Reddit Apr 6, 2026 2 min read

LocalLLaMA digs into Gemma 4 Per-Layer Embeddings and why the small models behave differently

A LocalLLaMA explainer argues that Gemma 4 E2B/E4B gain their efficiency from Per-Layer Embeddings. The key point is that many of those parameters behave more like large token lookup tables than always-active compute-heavy layers, which changes the inference trade-off.

#llm #gemma #inference

LLM Hacker News Apr 6, 2026 2 min read

Hacker News spots GuppyLM, an 8.7M-parameter teaching LLM you can train in minutes

A Show HN thread highlighted GuppyLM, a tiny 8.7M-parameter transformer with a 60K synthetic conversation dataset and Colab notebooks. The point is not state-of-the-art performance, but making the full LLM pipeline inspectable from data generation to inference.

#llm #education #open-source

LLM Reddit Apr 6, 2026 2 min read

LocalLLaMA Spots "Bankai," an XOR Patch Method for True 1-Bit LLMs

Bankai, highlighted in LocalLLaMA, proposes post-training adaptation for true 1-bit LLMs by applying sparse XOR patches directly to binary weights. According to the GitHub repo and paper, patches around 1 KB changed Bonsai 8B behavior with zero inference overhead, fixed 4 of 17 held-out failures without breaking 13 already-correct cases, and could be applied or reverted with the same XOR operation in microseconds.

#bankai #1-bit-llms #bonsai-8b

LLM Hacker News Apr 6, 2026 2 min read

Hacker News Picks Up Karpathy's "LLM Wiki" Pattern for Persistent Knowledge Bases

Andrej Karpathy's April 4, 2026 "LLM Wiki" gist proposes replacing one-shot retrieval with an interlinked wiki that an agent continuously maintains. Hacker News focused on the three-layer design of raw sources, wiki, and schema, plus the ingest, query, and lint loop that lets knowledge compound instead of being rediscovered from scratch for every prompt.

#llm-wiki #persistent-rag #knowledge-management

LLM Hacker News Apr 6, 2026 2 min read

Hacker News Highlights a Six-Part Blueprint for Coding Agents

Sebastian Raschka's April 4, 2026 article argues that coding-agent quality is shaped as much by the harness as by the base model. He breaks the stack into six components: live repo context, prompt and cache reuse, structured tools, context reduction, session memory, and bounded subagents. Hacker News treated it as a practical framework for understanding why products like Codex and Claude Code feel stronger than plain chat.

#coding-agents #agent-harness #repo-context

LLM X/Twitter Apr 5, 2026 1 min read

Together Research says LLMs can repair bad database query plans

Together Research says LLMs can patch faulty database query plans instead of regenerating them from scratch, and claims up to 4.78x speedups on some TPC-H and TPC-DS workloads. The tweet points to DBPlanBench, a DataFusion-based harness that exposes a physical operator graph to an LLM and uses iterative search to refine plan edits.

#together-ai #dbplanbench #query-optimization

LLM X/Twitter Apr 5, 2026 1 min read

Cursor details Composer 2's training stack in a new technical report

Cursor has published a technical report for Composer 2, outlining a two-stage recipe of continued pretraining and large-scale reinforcement learning for agentic software engineering. The company says the model reaches 61.3 on CursorBench, 61.7 on Terminal-Bench, and 73.7 on SWE-bench Multilingual while keeping pricing at $0.50/M input and $2.50/M output tokens.

#cursor #composer-2 #coding-model

LLM Reddit Apr 5, 2026 2 min read

A LocalLLaMA blind eval finds Qwen 3.5 wins more matchups while Gemma 4 posts higher averages

A LocalLLaMA user compared Gemma 4 31B, Gemma 4 26B-A4B, and Qwen 3.5 27B across 30 blind prompts judged by Claude Opus 4.6. The result is not one clear winner but a more useful trade-off story around reliability, verbosity, and category-specific strengths.

#gemma-4 #qwen3.5 #benchmarks

LLM Reddit Apr 5, 2026 1 min read

LocalLLaMA warns against judging Gemma 4 too early while llama.cpp fixes are still landing

A fresh LocalLLaMA thread argues that some early Gemma 4 failures are really inference-stack bugs rather than model quality problems. By linking active llama.cpp pull requests and user reports after updates, the post reframes launch benchmarks as a full-stack issue.

#gemma-4 #llama-cpp #inference

LLM Hacker News Apr 5, 2026 1 min read

HN spotlights Caveman, a Claude Code plugin that trims tokens with “caveman” responses

The GitHub project Caveman claims it can cut output tokens by about 75% by stripping filler language while preserving code and technical terms. On Hacker News, developers are treating it as a serious experiment in reducing agent cost, latency, and verbosity.

#claude-code #developer-tools #plugins