LLM

LLM Mar 14, 2026 2 min read

IBM Releases Granite 4.0 1B Speech for Edge-Ready Multilingual ASR and Speech Translation

IBM unveiled Granite 4.0 1B Speech on March 9, 2026 as a compact multilingual speech-language model for ASR and bidirectional speech translation. The company says it improves English transcription accuracy over its predecessor while cutting model size in half and adding Japanese support.

#ibm #granite #speech

LLM X/Twitter Mar 14, 2026 2 min read

OpenAI Rolls Out GPT-5.4 Across ChatGPT, the API, and Codex with 1M Context and Native Computer Use

OpenAI posted on March 5, 2026 that GPT-5.4 Thinking and GPT-5.4 Pro are rolling out across ChatGPT, the API, and Codex. The launch article positions GPT-5.4 as a professional-work model with 1M-token context, native computer use, stronger tool search, and better spreadsheet, document, and presentation performance.

#openai #gpt-5.4 #agents

LLM X/Twitter Mar 14, 2026 2 min read

Azure Pushes Claude 4.6 in Microsoft Foundry with 1M-Token Context, 600-Page Inputs, and Flat Pricing

Azure posted on March 14, 2026 that Claude Opus 4.6 and Sonnet 4.6 now support 1M-token context in Microsoft Foundry with flat pricing and higher media limits. Microsoft and Anthropic documentation confirm the 1M window, 600 image/PDF-page cap, and standard pricing across the full context range.

#azure #anthropic #claude

LLM Reddit Mar 14, 2026 2 min read

r/LocalLLaMA: Community benchmark data turns Apple Silicon local LLM claims into something measurable

A fast-rising r/LocalLLaMA thread says the community has already submitted nearly 10,000 Apple Silicon benchmark runs across more than 400 models. The post matters because it replaces scattered anecdotes with a shared dataset that begins to show consistent throughput patterns across M-series chips and context lengths.

#apple-silicon #benchmarks #omlx

LLM Reddit Mar 14, 2026 2 min read

r/LocalLLaMA: The real latency trade-offs between MLX and llama.cpp on M1 Max

A recent r/LocalLLaMA benchmark thread argues that tokens-per-second screenshots hide the real trade-offs between MLX and llama.cpp on Apple Silicon. MLX still wins on short-context generation, but long-context workloads can erase that headline speedup because prefill dominates total latency.

#mlx #llama.cpp #apple-silicon

LLM Mar 14, 2026 2 min read

OpenAI launches GPT-5.4 with stronger reasoning, native computer use, and 1M Codex context

OpenAI said on March 5, 2026 that GPT-5.4 is rolling out across ChatGPT, the API, and Codex. The new model combines GPT-5.3-Codex coding capability with OpenAI’s mainline reasoning stack, adds native computer-use features, and introduces experimental 1M-token context in Codex.

#openai #gpt-5.4 #reasoning

LLM Hacker News Mar 14, 2026 2 min read

Hacker News zeroes in on Anthropic's standard-priced 1M context rollout for Claude Opus 4.6 and Sonnet 4.6

A March 13, 2026 Hacker News thread focused on Anthropic's 1M context GA update for Claude Opus 4.6 and Sonnet 4.6, especially the removal of long-context premiums. The release also raises media limits to 600 images or PDF pages and rolls 1M context into Claude Code for Max, Team, and Enterprise users.

#anthropic #claude #context-window

LLM Mar 13, 2026 2 min read

Google introduces the Developer Knowledge API and MCP Server for Gemini Code Assist workflows

Google introduced the Developer Knowledge API and an open-source MCP Server on February 4, 2026. The tools are meant to connect internal documentation, public URLs, and other team knowledge sources to Gemini Code Assist and AI-agent workflows with less custom plumbing.

#google #gemini #developers

LLM Mar 13, 2026 2 min read

Karpathy says autoresearch cut nanochat Time to GPT-2 by about 11%

Andrej Karpathy says his autoresearch setup reduced nanochat's Time to GPT-2 from 2.02 hours to 1.80 hours. He said the agent explored roughly 700 changes over about two days and found around 20 additive improvements, but the result should still be read as a source claim rather than an independently audited benchmark.

#karpathy #autoresearch #nanochat

LLM Reddit Mar 13, 2026 2 min read

r/singularity highlights a paper arguing the LM head wastes most of the training signal

A Reddit thread surfaced arXiv paper 2603.10145, which argues the output layer of language models is not just a softmax expressivity issue but an optimization bottleneck that suppresses 95-99% of gradient norm. The discussion centered on whether better head designs could unlock more efficient LLM training.

#backpropagation #lm-head #optimization

LLM Reddit Mar 13, 2026 2 min read

r/MachineLearning debates whether LLM benchmark papers age out before they matter

A high-scoring discussion in r/MachineLearning asks what benchmarking papers are for when proprietary models change monthly and old versions disappear. The strongest replies argued that model rankings go stale fast, but the datasets and failure cases can remain useful as durable eval assets.

#benchmarks #evaluation #llm-research

LLM Hacker News Mar 13, 2026 2 min read

Hacker News examines Percepta's claim that transformers can execute programs internally

Percepta's March 11 post says it built a computer inside a transformer that can execute arbitrary C programs for millions of steps with exponentially faster inference via 2D attention heads. HN readers saw a provocative research direction, but they also asked for clearer writing, harder benchmarks, and evidence that the idea scales.

#transformers #inference #llm-research