#llm

LLM Hacker News Mar 20, 2026 2 min read

Hacker News Debates What 16 GPUs Really Changed in Karpathy's Autoresearch

SkyPilot says Claude Code ran about 910 autoresearch experiments in 8 hours, and Hacker News focused on whether the real breakthrough was agent strategy, infrastructure, or both.

#llm #gpus #agents

LLM Reddit Mar 19, 2026 2 min read

LocalLLaMA highlights Mamba-3, a state space model built around inference efficiency

A LocalLLaMA thread on March 18, 2026 pushed fresh attention toward Mamba-3, a new state space model release from researchers at Carnegie Mellon University, Princeton, Cartesia AI, and Together AI. The project shifts its design goal from training speed to inference efficiency and claims prefill+decode latency wins over Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale.

#mamba-3 #ssm #inference

LLM Mar 18, 2026 2 min read

Google launches Gemini 3.1 Flash-Lite for high-volume AI workloads at lower cost

Google introduced Gemini 3.1 Flash-Lite on March 3, 2026 as its fastest and most cost-efficient Gemini 3 series model. The model is rolling out in preview through the Gemini API in Google AI Studio and Vertex AI, with pricing of $0.25/1M input tokens and $1.50/1M output tokens, plus claims of a 2.5x faster Time to First Answer Token and 45% higher output speed than 2.5 Flash.

#google #gemini #flash-lite

LLM Reddit Mar 18, 2026 2 min read

r/MachineLearning highlights mlx-tune for Apple Silicon LLM fine-tuning with an Unsloth-style API

A project post in r/MachineLearning points to mlx-tune, a library that wraps Apple’s MLX stack in an Unsloth-compatible training API for SFT, DPO, GRPO, LoRA, and vision-language fine-tuning on Apple Silicon Macs.

#apple-silicon #mlx #fine-tuning

LLM Reddit Mar 17, 2026 3 min read

Covenant-72B puts permissionless distributed GPU training ahead of raw hype

A r/LocalLLaMA post that reached 92 points and 25 comments spotlighted Covenant-72B as a 72B-parameter model trained from scratch by 20+ participants through decentralized infrastructure on the Bittensor blockchain. The most credible story here is not an unsupported performance victory, but a concrete demonstration of permissionless collaborative pre-training, SparseLoCo-based communication reduction, Apache 2.0 licensing, and a separate chat-tuned variant.

#llm #decentralized-training #bittensor

LLM Reddit Mar 17, 2026 2 min read

Unsloth Studio beta goes after the local model workflow in one interface

A high-engagement r/LocalLLaMA post highlighted Unsloth Studio, a beta open-source web UI that aims to train, run, and export open models from one local interface. The discussion framed it as a possible LM Studio challenger in the GGUF ecosystem, while top commenters noted that many advanced users still lean on vLLM or direct llama.cpp workflows.

#llm #unsloth #gguf

LLM Mar 16, 2026 2 min read

OpenAI introduces GPT-5.4 for tougher coding and agent workflows

On March 5, 2026, OpenAI introduced GPT-5.4 as a flagship model focused on relevance, contextual understanding, and instruction following. In the API, it pairs a 1M-token context window with stronger tool search for long, multi-tool workflows.

#openai #gpt-5.4 #llm

LLM Mar 16, 2026 2 min read

Google positions Gemini 3.1 Flash-Lite as a low-cost model for high-volume workloads

Google DeepMind updated Gemini 3.1 Flash-Lite on March 3, 2026 as a low-cost model for high-volume, low-latency work. Google says it supports 128k input, 8k output, multimodal input, native audio generation, and pricing from $0.10 per 1M input tokens.

#google #gemini #llm

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Tracks NVIDIA’s Nemotron License Change and What It Means for Derivative Models

A high-signal LocalLLaMA thread on March 15, 2026 focused on a license swap for NVIDIA’s Nemotron model family. Comparing the current NVIDIA Nemotron Model License with the older Open Model License shows why the community reacted: the old guardrail-termination clause and Trustworthy AI cross-reference are no longer present, while the newer text leans on a simpler NOTICE-style attribution structure.

#nvidia #nemotron #licensing

LLM Reddit Mar 16, 2026 2 min read

LocalLLaMA Pushes GreenBoost, a Linux Driver That Extends NVIDIA GPU Memory with RAM and NVMe

A LocalLLaMA thread amplified Phoronix coverage of GreenBoost, an experimental GPLv2 Linux module that adds a multi-tier memory path for NVIDIA GPUs. The design pairs a kernel module with a CUDA shim so large allocations can spill from limited on-card vRAM into pinned system RAM and NVMe-backed storage without modifying CUDA applications.

#nvidia #vram #cuda

LLM sources.twitter Mar 14, 2026 2 min read

Together AI Open-Sources Open Deep Research v2 with Dataset, Code, and a Multi-Step Research Workflow

Together AI said on March 13, 2026 that v2 of Open Deep Research is fully free and open source. The companion blog describes a planner and self-reflection workflow for multi-hop web research and ships code plus evaluation assets for developers.

#deep-research #open-source #agents

LLM Reddit Mar 12, 2026 2 min read

Reddit Reconsiders Function Calling After a Manus Engineer's Unix-Style Agent Post

A widely shared r/LocalLLaMA post from a former Manus backend lead argues that a single run(command="...") interface often beats a catalog of typed function calls for agents. The post ties Unix text streams to token-based model interfaces, then backs the claim with design patterns around piping, progressive help, stderr visibility, and overflow handling.

#llm #agents #cli