LLM

LLM Reddit Apr 15, 2026 2 min read

LocalLLaMA Jumps on Gemma-4 Audio Support in llama-server

The LocalLLaMA thread took off because native speech-to-text inside llama.cpp is exactly the kind of feature that removes an extra pipeline from local agent setups. The post says llama-server can now run STT with Gemma-4 E2A and E4A models, and commenters immediately started comparing the practical experience to Whisper and Voxtral.

#llama.cpp #gemma4 #speech-to-text

LLM Hacker News Apr 15, 2026 2 min read

Hacker News Sees Claude Code Routines as Autopilot for Repetitive Dev Work

Hacker News liked the idea immediately, but the comments also went straight to the hard question: how useful is more autonomy if usage limits stay tight. Anthropic’s new Claude Code Routines package a prompt, repositories, and connectors into cloud-run automations that can fire on schedules, API calls, or GitHub events.

#claude-code #agents #automation

LLM Apr 14, 2026 2 min read

GitHub lets Claude and Codex agents pick newer models per task

GitHub is making third-party coding agents less static: Claude and Codex users on github.com can now choose among 4 Anthropic models and 3 OpenAI models when they launch a task. That matters because model choice changes latency, spend, and code quality far more than a small UI toggle suggests.

#github #copilot #claude

LLM Apr 14, 2026 2 min read

GitHub puts Copilot in US/EU data walls and clears FedRAMP

GitHub is turning Copilot compliance from slideware into deployable policy: US and EU data residency now covers all generally available Copilot features, and US government deployments get FedRAMP Moderate infrastructure. The practical catch is cost, with data-resident requests priced at a 1.1x model multiplier.

#github #copilot #fedramp

LLM Apr 14, 2026 2 min read

OpenAI opens GPT-5.4-Cyber to verified defenders, not the public

OpenAI is separating defensive cyber use from broad model access: verified individuals and vetted teams can now reach a cyber-permissive GPT-5.4 variant with binary reverse engineering support. The move matters because TAC is expanding from a narrow program to thousands of defenders and hundreds of teams.

#openai #cybersecurity #gpt-5.4-cyber

LLM Apr 14, 2026 2 min read

Anthropic pushes Claude into alignment research, reaches 0.97 PGR

Anthropic is using Claude not just as a model to align, but as a researcher that improved weak-to-strong supervision nearly to the ceiling. In the linked study, nine Claude Opus 4.6 agents pushed performance-gap recovery from a 0.23 human baseline to 0.97 after 800 cumulative research hours.

#anthropic #claude #alignment

LLM Reddit Apr 14, 2026 2 min read

Reddit Debates a 1.088B Spiking Language Model Trained From Scratch

r/MachineLearning treated this less like a finished breakthrough and more like a serious challenge to the current assumptions around large-scale spike-domain training. The April 13, 2026 post reported a 1.088B pure SNN language model reaching loss 4.4 at 27K steps with 93% sparsity, while commenters pushed for more comparable metrics and longer training before drawing big conclusions.

#spiking-neural-networks #language-models #snn

LLM Reddit Apr 14, 2026 2 min read

Reddit Spots an Open-Source DFlash Runtime That Pushes Qwen3.5 to 4x Speeds on Apple Silicon

LocalLLaMA paid attention to this post because it looked like real engineering cleanup instead of another inflated speed screenshot. On April 13, 2026, the author said a stock-MLX baseline for Qwen3.5-9B at 2048 tokens improved from 30.96 tok/s to 127.07 tok/s, with 89.36% acceptance and the full runtime released as open source.

#dflash #speculative-decoding #mlx

LLM Apr 14, 2026 2 min read

Gemini's personal memory layer lands in India, starting with paid tiers

Google is no longer treating AI memory as a niche add-on. By bringing Gemini Personal Intelligence to India, it is testing whether a model that reads Gmail, Photos, and watch history can become a daily assistant in one of its biggest markets.

#google #gemini #india

LLM Apr 14, 2026 2 min read

Cloudflare cuts MCP token drag with Code Mode and hunts shadow servers

MCP is moving from developer convenience to enterprise control problem. Cloudflare's new architecture matters because it tackles both parts of that shift at once: bloated tool schemas and the security mess created by ungoverned local servers.

#cloudflare #mcp #agents

LLM Apr 14, 2026 2 min read

Cloudflare turns GPT-5.4 and Codex into production agents for enterprise stacks

Enterprise AI teams are discovering that model quality is only half the problem. OpenAI's Cloudflare Agent Cloud tie-up is about collapsing model access, state, storage, and tool execution into one production path instead of another demo pipeline.

#openai #cloudflare #agents

LLM Apr 14, 2026 2 min read

GitHub lets Copilot CLI sessions move from terminal to phone

Long-running CLI agent work no longer has to stay pinned to one screen. GitHub's new <code>copilot --remote</code> feature mirrors a live session to the web or GitHub Mobile, where you can send follow-up commands, switch modes, and handle approvals from another device.

#github #copilot #cli