LLM

LLM X/Twitter Mar 14, 2026 2 min read

Azure Pushes Claude 4.6 in Microsoft Foundry with 1M-Token Context, 600-Page Inputs, and Flat Pricing

Azure posted on March 14, 2026 that Claude Opus 4.6 and Sonnet 4.6 now support 1M-token context in Microsoft Foundry with flat pricing and higher media limits. Microsoft and Anthropic documentation confirm the 1M window, 600 image/PDF-page cap, and standard pricing across the full context range.

#azure #anthropic #claude

LLM Reddit Mar 14, 2026 2 min read

r/LocalLLaMA: Community benchmark data turns Apple Silicon local LLM claims into something measurable

A fast-rising r/LocalLLaMA thread says the community has already submitted nearly 10,000 Apple Silicon benchmark runs across more than 400 models. The post matters because it replaces scattered anecdotes with a shared dataset that begins to show consistent throughput patterns across M-series chips and context lengths.

#apple-silicon #benchmarks #omlx

LLM Reddit Mar 14, 2026 2 min read

r/LocalLLaMA: The real latency trade-offs between MLX and llama.cpp on M1 Max

A recent r/LocalLLaMA benchmark thread argues that tokens-per-second screenshots hide the real trade-offs between MLX and llama.cpp on Apple Silicon. MLX still wins on short-context generation, but long-context workloads can erase that headline speedup because prefill dominates total latency.

#mlx #llama.cpp #apple-silicon

LLM Hacker News Mar 14, 2026 2 min read

Hacker News zeroes in on Anthropic's standard-priced 1M context rollout for Claude Opus 4.6 and Sonnet 4.6

A March 13, 2026 Hacker News thread focused on Anthropic's 1M context GA update for Claude Opus 4.6 and Sonnet 4.6, especially the removal of long-context premiums. The release also raises media limits to 600 images or PDF pages and rolls 1M context into Claude Code for Max, Team, and Enterprise users.

#anthropic #claude #context-window

LLM Mar 13, 2026 2 min read

Google introduces the Developer Knowledge API and MCP Server for Gemini Code Assist workflows

Google introduced the Developer Knowledge API and an open-source MCP Server on February 4, 2026. The tools are meant to connect internal documentation, public URLs, and other team knowledge sources to Gemini Code Assist and AI-agent workflows with less custom plumbing.

#google #gemini #developers

LLM Mar 13, 2026 2 min read

Karpathy says autoresearch cut nanochat Time to GPT-2 by about 11%

Andrej Karpathy says his autoresearch setup reduced nanochat's Time to GPT-2 from 2.02 hours to 1.80 hours. He said the agent explored roughly 700 changes over about two days and found around 20 additive improvements, but the result should still be read as a source claim rather than an independently audited benchmark.

#karpathy #autoresearch #nanochat

LLM Reddit Mar 13, 2026 2 min read

r/singularity highlights a paper arguing the LM head wastes most of the training signal

A Reddit thread surfaced arXiv paper 2603.10145, which argues the output layer of language models is not just a softmax expressivity issue but an optimization bottleneck that suppresses 95-99% of gradient norm. The discussion centered on whether better head designs could unlock more efficient LLM training.

#backpropagation #lm-head #optimization

LLM Reddit Mar 13, 2026 2 min read

r/MachineLearning debates whether LLM benchmark papers age out before they matter

A high-scoring discussion in r/MachineLearning asks what benchmarking papers are for when proprietary models change monthly and old versions disappear. The strongest replies argued that model rankings go stale fast, but the datasets and failure cases can remain useful as durable eval assets.

#benchmarks #evaluation #llm-research

LLM Hacker News Mar 13, 2026 2 min read

Hacker News examines Percepta's claim that transformers can execute programs internally

Percepta's March 11 post says it built a computer inside a transformer that can execute arbitrary C programs for millions of steps with exponentially faster inference via 2D attention heads. HN readers saw a provocative research direction, but they also asked for clearer writing, harder benchmarks, and evidence that the idea scales.

#transformers #inference #llm-research

LLM Hacker News Mar 13, 2026 2 min read

Hacker News spots CanIRun.ai, a browser-side local AI compatibility checker

CanIRun.ai runs entirely in the browser, detects GPU, CPU, and RAM through WebGL, WebGPU, and navigator APIs, and estimates which quantized models fit your machine. HN readers liked the idea but immediately pushed on missing hardware entries, calibration, and reverse-lookup features.

#local-ai #llm-inference #hardware

LLM Mar 13, 2026 2 min read

NVIDIA releases open Nemotron 3 Super with 1M context and up to 5x higher throughput for agentic AI

NVIDIA introduced Nemotron 3 Super on March 11, 2026 as an open 120B-parameter model built for agentic AI systems. The company says the model tackles long-context cost and reasoning overhead with a 1M-token window, hybrid MoE design and up to 5x higher throughput.

#nvidia #nemotron #agentic-ai

LLM Mar 13, 2026 2 min read

Google opens Gemini Embedding 2 preview as its first natively multimodal embedding model

Google has put Gemini Embedding 2 into public preview through the Gemini API and Vertex AI. The model is Google’s first natively multimodal embedding system, combining text, image, video, audio, and document inputs in one embedding space.

#google #gemini #embeddings