LLM

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 lit up LocalLLaMA because the agent actually debugged the app

r/LocalLLaMA pushed this past 900 points because it was not another score table. The hook was a local coding agent noticing and fixing its own canvas and wave-completion bugs.

#qwen #local-llm #agents

LLM Reddit Apr 20, 2026 2 min read

Qwen3.6 on an M5 Max Made r/LocalLLaMA Talk About Keeping Code Local

r/LocalLLaMA pushed this post up because the “trust me bro” report had real operating conditions: 8-bit quantization, 64k context, OpenCode, and Android debugging.

#qwen #local-llm #coding-agents

LLM Reddit Apr 20, 2026 1 min read

llama.cpp’s Speculative Checkpointing Turned Local Inference Into a Parameter Hunt

LocalLLaMA upvoted the merge because it is immediately testable, but the useful caveat was clear: speedups depend heavily on prompt repetition and draft acceptance.

#llama.cpp #inference #local-llm

LLM Reddit Apr 19, 2026 2 min read

LocalLLaMA’s Qwen 3.6 Thread Is Really About Configuration

LocalLLaMA reacted because the post was not just another “new model feels strong” claim. The author said Qwen 3.6 handled workloads normally reserved for Opus and Codex on an M5 Max 128GB setup, but the practical hook was the warning to enable preserve_thinking.

#qwen #local-llm #configuration

LLM Hacker News Apr 19, 2026 2 min read

HN Is Testing Opus 4.7’s Tokenizer, Not Just Complaining About Limits

HN upvoted this because it turned vague limit anxiety into numbers. Tokenomics says 541 anonymous submissions averaged 466 request tokens on Opus 4.7 versus 349 on Opus 4.6, a 38.1% increase, and the thread immediately argued over what that means for real Claude usage.

#llm #anthropic #tokenizer

LLM Reddit Apr 19, 2026 2 min read

A 145-result coding eval put Kimi K2.6, Opus 4.7, GLM 5.1 and Minimax under LocalLLaMA review

LocalLLaMA cared about this eval post because it mixed leaderboard data with lived coding-agent pain: Opus 4.7 scored well, but the author says it felt worse in real use.

#coding-agents #benchmarks #kimi

LLM Reddit Apr 19, 2026 2 min read

Local tool calling hit LocalLLaMA’s reality check: model, quant, or harness?

A r/LocalLLaMA thread turned one user’s failed local tool-calling setup into a practical checklist: OpenWebUI, native tool calls, quants, runtimes and wrappers all matter.

#local-llm #tool-calling #qwen

LLM Apr 19, 2026 1 min read

LLM judges miss unsafe answers 30% more when stakes are named

A new arXiv preprint reports that LLM judges became meaningfully more lenient when prompts framed evaluation consequences, exposing a weak point in automated safety and quality benchmarks.

#llm-evals #ai-safety #benchmarks

LLM Reddit Apr 19, 2026 1 min read

A Qwen3.6 tuning post made --n-cpu-moe the LocalLLaMA knob of the day

r/LocalLLaMA cared because the numbers were concrete: 79 t/s on an RTX 5070 Ti with 128K context, tied to one llama.cpp flag choice.

#qwen #llama-cpp #local-llm

LLM Reddit Apr 19, 2026 1 min read

r/LocalLLaMA asked why flagship model weights do not leak more often

The thread was popular because it turned a naive-sounding question into a useful map of access control, logging, and career risk.

#model-weights #security #llm-ops

LLM Apr 18, 2026 2 min read

Codex now controls apps, browsers, and images for 3M weekly devs

OpenAI says more than 3 million developers use Codex each week, and the desktop app is now moving beyond code edits. The update adds background computer use on macOS, an in-app browser, gpt-image-1.5 image generation, 90+ new plugins, PR review workflows, SSH devboxes in alpha, automations, and memory preview.

#openai #codex #agents

LLM Hacker News Apr 18, 2026 2 min read

MacMind made HN see transformers with the hood open

HN upvoted MacMind because it shrinks transformer mystique to something inspectable: 1,216 parameters in HyperTalk on a Macintosh SE/30. The demo learns bit-reversal for FFT using embeddings, positional encoding, self-attention, backpropagation and gradient descent.

#transformers #hypercard #retro-computing