A high-ranking Hacker News thread amplified Apple's paper on simple self-distillation for code generation, a training recipe that improves pass@1 without verifier models or reinforcement learning.
#llm
RSS FeedA Hacker News discussion surfaced a new paper showing that a model can improve coding performance by training on its own sampled answers. The authors report Qwen3-30B-Instruct rising from 42.4% to 55.3% pass@1 on LiveCodeBench v6 without a verifier, a teacher model, or reinforcement learning.
Stanford's public CS25 course is again operating as an open lecture stream for Transformer research, with Zoom access, recordings, and a community layer that extends beyond campus.
Lemonade packages local AI inference behind an OpenAI-compatible server that targets GPUs and NPUs, aiming to make open models easier to deploy on everyday PCs.
A Reddit thread in r/LocalLLaMA drew 142 upvotes and 29 comments around CoPaw-9B. The discussion focused on its Qwen3.5-based 9B agent positioning, 262,144-token context window, and whether local users would get GGUF or other quantized builds quickly.
Cloudflare said on March 30, 2026 that its advanced Client-Side Security tools are now available to all users. Cloudflare's blog says the release combines graph neural networks with LLM triage, cuts false positives by up to 200x, and makes advanced client-side protections self-serve while adding complimentary domain-based threat intelligence in the free bundle.
Mistral announced Mistral Small 4 on March 16, 2026 as a single open model that combines reasoning, multimodal input, and agentic coding. Key specs include 119B total parameters, 6B active parameters per token, a 256k context window, Apache 2.0 licensing, and configurable reasoning effort.
Mistral introduced Leanstral on March 16, 2026 as an open-source code agent built specifically for Lean 4. The release combines 6B active parameters, an Apache 2.0 license, a new FLTEval benchmark, and immediate availability in Mistral Vibe, API form, and downloadable weights.
A new r/MachineLearning post pushes TurboQuant beyond KV-cache talk and into weight compression, with a GitHub implementation that targets drop-in low-bit LLM inference.
ngrok’s March 25, 2026 explainer lays out how quantization can make LLMs roughly 4x smaller and 2x faster, and what the real 4-bit versus 8-bit tradeoff looks like. Hacker News drove the post to 247 points and 46 comments, reopening the discussion around memory bottlenecks and the economics of local inference.
Google Research said on March 16, 2026 that its superconductivity case study found curated-source systems outperforming open-web LLMs. NotebookLM and a custom RAG setup scored highest on expert-written questions about high-temperature superconductors.
Google introduced Gemini 3.1 Flash-Lite on Mar 03, 2026 as its fastest and lowest-cost Gemini 3 series model. The preview release targets high-volume developer workloads with lower pricing, faster latency, and stronger benchmark scores than the prior 2.5 Flash tier.