LLM

LLM Reddit Apr 8, 2026 2 min read

r/LocalLLaMA argues Qwen3.5 27B is where local speed, quality, and hardware practicality meet

A recent r/LocalLLaMA post presents Qwen3.5 27B as an unusually strong local inference sweet spot. The author reports about 19.7 tokens per second on an RTX A6000 48GB with llama.cpp and a 32K context, while the comments turn into a detailed debate about dense-versus-MoE VRAM economics.

#qwen #local-llm #llama-cpp

LLM Hacker News Apr 8, 2026 2 min read

MegaTrain turns a Hacker News paper pick into a memory-systems debate about single-GPU LLM training

MegaTrain proposes training 100B+ parameter LLMs at full precision on a single GPU by keeping parameters and optimizer states in host memory and streaming layers through the device. The recent Hacker News interest is notable because the paper reframes the problem as one of memory-system design rather than simple GPU count.

#llm-training #systems #gpu

LLM X/Twitter Apr 8, 2026 2 min read

OpenAI says Codex reached 3 million weekly users and will reset usage limits at every new million up to 10 million

On April 7, 2026, OpenAI’s Tibo Sottiaux said Codex reached 3 million weekly users. He added that the jump from 2 million to 3 million took less than a month, and OpenAI will reset usage limits at each additional million users until the product reaches 10 million weekly users.

#openai #codex #developer-tools

LLM Reddit Apr 8, 2026 2 min read

r/LocalLLaMA Shares a University-Hospital Stack Serving 1B+ Tokens Per Day Locally

A popular r/LocalLLaMA self-post lays out a concrete 2x H200 serving stack for GPT-OSS-120B, including routing, monitoring, and queueing tradeoffs. The appeal is not just the headline throughput, but the unusually detailed operational data behind it.

#localllama #vllm #litellm

LLM Hacker News Apr 8, 2026 2 min read

Hacker News Tracks Claude Mythos Preview's Cybersecurity Leap

Anthropic's April 7, 2026 security write-up for Claude Mythos Preview argues that frontier LLM gains are now translating into real exploit-development capability. Hacker News is treating the post as a sign that defensive tooling and offensive risk are accelerating together.

#anthropic #cybersecurity #llm

LLM Reddit Apr 8, 2026 1 min read

r/LocalLLaMA Pushes Gemma 4 Local Fine-Tuning With an 8GB VRAM Guide and Bug Fixes

A high-signal r/LocalLLaMA thread is circulating practical Gemma 4 fine-tuning guidance from Unsloth. The post claims Gemma-4-E2B and E4B can be adapted locally with 8GB VRAM, about 1.5x faster training, roughly 60% less VRAM than FA2 setups, and several fixes for early Gemma 4 training and inference bugs.

#gemma-4 #fine-tuning #local-llm

LLM Reddit Apr 8, 2026 2 min read

r/MachineLearning Follows Dante-2B, a Bilingual Italian/English LLM Trained From Scratch on 2×H200

A detailed r/MachineLearning post is drawing interest to Dante-2B, a 2.1B dense Italian/English model trained from scratch on 2×H200 GPUs. The project emphasizes tokenizer efficiency for Italian, a 300B token corpus, and a fully open release of weights, tokenizer, and training pipeline after phase 2.

#dante-2b #bilingual-llm #italian-nlp

LLM Hacker News Apr 8, 2026 2 min read

Hacker News Sees GLM-5.1 Push Further Into Long-Horizon Agentic Engineering

Hacker News picked up Z.ai's GLM-5.1 as a model aimed less at one-shot wins and more at sustained agentic work. Z.ai reports 58.4 on SWE-Bench Pro, 42.7 on NL2Repo, 66.5 on Terminal Bench 2.0, and long-horizon runs that keep improving through hundreds of iterations and thousands of tool calls.

#glm-5.1 #agentic-coding #swe-bench

LLM X/Twitter Apr 7, 2026 1 min read

GitHub makes GPT-5.3-Codex the first long-term support model for Copilot Enterprise

GitHub Changelog's March 19, 2026 X post announced that GPT-5.3-Codex is the first long-term support model for Copilot Business and Copilot Enterprise. GitHub says the model launched on February 5, 2026, stays available through February 4, 2027, and becomes the new base model by May 17, 2026.

#github #copilot #gpt-5.3-codex

LLM X/Twitter Apr 7, 2026 1 min read

GitHub deprecates the GPT-5.1-Codex family across Copilot and points teams to GPT-5.3-Codex

GitHub Changelog said on April 3, 2026 that GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini were deprecated across all Copilot surfaces as of April 1. GitHub tells organizations to move workflows and model policies to supported models, with GPT-5.3-Codex named as the replacement.

#github #copilot #gpt-5.1-codex

LLM X/Twitter Apr 7, 2026 1 min read

GitHub lets Copilot CLI use BYOK and fully local models without GitHub-hosted routing

GitHub Changelog's April 7, 2026 X post said Copilot CLI can now connect to Azure OpenAI, Anthropic, and other OpenAI-compatible endpoints, or run fully local models instead of GitHub-hosted routing. GitHub's changelog adds that offline mode disables telemetry, unauthenticated use is possible with provider credentials alone, and built-in sub-agents inherit the chosen provider.

#github #copilot-cli #byok

LLM Reddit Apr 7, 2026 2 min read

LocalLLaMA Flags DFlash as an Open-Source Route to Faster Speculative Decoding

A LocalLLaMA thread pulled attention to DFlash, a block-diffusion draft model for speculative decoding whose paper claims lossless acceleration above 6x and direct support for vLLM, SGLang, and selected Transformers backends.

#speculative-decoding #inference #vllm