#llm

LLM Reddit Mar 21, 2026 1 min read

r/LocalLLaMA Spots Mistral 4 Landing in Transformers with 119B MoE and 256k Context

A merged Hugging Face Transformers PR surfaced on r/LocalLLaMA shows Mistral 4 as a hybrid instruct/reasoning model with 128 experts, 4 active experts, 6.5B activated parameters per token, 256k context, and Apache 2.0 licensing.

#llm #mistral #open-models

LLM Hacker News Mar 21, 2026 2 min read

Hacker News Tracks Moonshot AI’s Attention Residuals as a Drop-In Upgrade for Transformer Depth

The March 20, 2026 HN discussion around Attention Residuals focused on a simple claim with large implications: replace fixed residual addition with learned depth-wise attention and recover performance with modest overhead.

#llm #transformers #research

101

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Examines NanoGPT Slowrun's 10x Data-Efficiency Claim

Q Labs says 100M tokens and an 18B-parameter ensemble can match a 1B-token baseline, and Hacker News immediately focused on whether that gain survives serving and deployment.

#llm #training #scaling-laws

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Debates What 16 GPUs Really Changed in Karpathy's Autoresearch

SkyPilot says Claude Code ran about 910 autoresearch experiments in 8 hours, and Hacker News focused on whether the real breakthrough was agent strategy, infrastructure, or both.

#llm #gpus #agents

LLM Reddit Mar 19, 2026 2 min read

LocalLLaMA highlights Mamba-3, a state space model built around inference efficiency

A LocalLLaMA thread on March 18, 2026 pushed fresh attention toward Mamba-3, a new state space model release from researchers at Carnegie Mellon University, Princeton, Cartesia AI, and Together AI. The project shifts its design goal from training speed to inference efficiency and claims prefill+decode latency wins over Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale.

#mamba-3 #ssm #inference

LLM Reddit Mar 18, 2026 2 min read

r/MachineLearning highlights mlx-tune for Apple Silicon LLM fine-tuning with an Unsloth-style API

A project post in r/MachineLearning points to mlx-tune, a library that wraps Apple’s MLX stack in an Unsloth-compatible training API for SFT, DPO, GRPO, LoRA, and vision-language fine-tuning on Apple Silicon Macs.

#apple-silicon #mlx #fine-tuning

122

LLM Reddit Mar 17, 2026 3 min read

Covenant-72B puts permissionless distributed GPU training ahead of raw hype

A r/LocalLLaMA post that reached 92 points and 25 comments spotlighted Covenant-72B as a 72B-parameter model trained from scratch by 20+ participants through decentralized infrastructure on the Bittensor blockchain. The most credible story here is not an unsupported performance victory, but a concrete demonstration of permissionless collaborative pre-training, SparseLoCo-based communication reduction, Apache 2.0 licensing, and a separate chat-tuned variant.

#llm #decentralized-training #bittensor

LLM Reddit Mar 17, 2026 2 min read

Unsloth Studio beta goes after the local model workflow in one interface

A high-engagement r/LocalLLaMA post highlighted Unsloth Studio, a beta open-source web UI that aims to train, run, and export open models from one local interface. The discussion framed it as a possible LM Studio challenger in the GGUF ecosystem, while top commenters noted that many advanced users still lean on vLLM or direct llama.cpp workflows.

#llm #unsloth #gguf

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models

A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.

#nvidia #nemotron #licensing

123

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

A LocalLLaMA thread amplified Phoronix coverage of GreenBoost, an experimental GPLv2 Linux module that adds a multi-tier memory path for NVIDIA GPUs. The design pairs a kernel module with a CUDA shim so large allocations can spill from limited on-card vRAM into pinned system RAM and NVMe-backed storage without modifying CUDA applications.

#nvidia #vram #cuda

LLM X/Twitter Mar 14, 2026 2 min read

Together AI Open-Sources Open Deep Research v2 with Dataset, Code, and a Multi-Step Research Workflow

Together AI said on March 13, 2026 that v2 of Open Deep Research is fully free and open source. The companion blog describes a planner and self-reflection workflow for multi-hop web research and ships code plus evaluation assets for developers.

#deep-research #open-source #agents

112

LLM Hacker News Mar 12, 2026 2 min read

Hacker News Examines a Context-Aware Permission Guard for Claude Code

A Show HN post for nah introduced a PreToolUse hook that classifies tool calls by effect instead of relying on blanket allow-or-deny rules. The README emphasizes path checks, content inspection, and optional LLM escalation, while HN discussion focused on sandboxing, command chains, and whether policy engines can really contain agentic tools.

#llm #agent-safety #claude-code