LLM

LLM Reddit Mar 15, 2026 2 min read

LocalLLaMA Patch Claims Faster Qwen3.5-397B Inference on Blackwell Workstations With a K=64 Kernel Fix

A March 14, 2026 LocalLLaMA post outlined a CUTLASS and FlashInfer patch for SM120 Blackwell workstations, claiming major gains for Qwen3.5-397B NVFP4 inference and linking the work to FlashInfer PR #2786.

#qwen #blackwell #vllm

LLM Reddit Mar 15, 2026 2 min read

r/LocalLLaMA: StepFun Releases the SFT Dataset Behind Step 3.5 Flash

StepFun opened more than a model card by releasing the Step-3.5-Flash-SFT dataset on Hugging Face. The repo bundles raw JSON data, tokenizer snapshots, and StepTronOSS-oriented compiled shards, while the Reddit discussion focused on reproducibility, reasoning traces, and the implications of the dual-license setup.

#stepfun #sft #datasets

LLM Reddit Mar 15, 2026 2 min read

r/LocalLLaMA: Qwen 3.5 27B Hits ~2000 TPS in a Document-Classification Setup

A r/LocalLLaMA field report showed how a very specific local inference workload was tuned for throughput. The author reported about 2,000 tokens per second while classifying markdown documents with Qwen 3.5 27B, and the comment thread turned the post into a practical optimization discussion.

#qwen #localllm #llama-cpp

LLM Hacker News Mar 15, 2026 2 min read

HN: Anthropic Makes 1M Context Standard for Opus 4.6 and Sonnet 4.6

Anthropic says 1M context is now generally available for Opus 4.6 and Sonnet 4.6 with standard pricing, no long-context premium, and media limits expanded to 600 images or PDF pages. Hacker News treated the announcement as a practical deployment story rather than a simple spec bump.

#anthropic #claude #long-context

LLM Mar 14, 2026 2 min read

Google Expands Gemini Across Docs, Sheets, Slides, and Drive for Faster Workspace Creation

Google is rolling out new Gemini beta features for Docs, Sheets, Slides, and Drive to Google AI Ultra and Pro subscribers. The update lets Gemini create and edit work using files, emails, and the web, while Drive adds AI Overviews and a new Ask Gemini flow.

#google #gemini #workspace

LLM X/Twitter Mar 14, 2026 2 min read

Perplexity Extends Agent API with Sandbox Execution as a Tool and Standalone Service

Perplexity said on March 11, 2026 that its Sandbox API will become both an Agent API tool and a standalone service. Existing docs already frame Agent API as a multi-provider interface with explicit tool configuration, so the update pushes code execution closer to a first-class orchestration primitive.

#agent-api #sandbox #developer-tools

LLM X/Twitter Mar 14, 2026 2 min read

Together AI Open-Sources Open Deep Research v2 with Dataset, Code, and a Multi-Step Research Workflow

Together AI said on March 13, 2026 that v2 of Open Deep Research is fully free and open source. The companion blog describes a planner and self-reflection workflow for multi-hop web research and ships code plus evaluation assets for developers.

#deep-research #open-source #agents

LLM Reddit Mar 14, 2026 2 min read

r/MachineLearning Questions Whether COCONUT’s “Latent Reasoning” Comes from Architecture or Curriculum

A r/MachineLearning post argues that Meta’s COCONUT results may owe more to curriculum design and sequential processing than to the headline mechanism of recycling hidden states as latent thought tokens.

#coconut #latent-reasoning #curriculum-learning

LLM Reddit Mar 14, 2026 3 min read

LocalLLaMA Highlights a 14B Ada Coding Model Tuned for Safety-Critical Software Workflows

A LocalLLaMA post claims a QLoRA-tuned 14B Qwen coder model can beat frontier proprietary models on Ada compilation tasks, reviving interest in domain-specific coding models for niche but high-stakes languages.

#ada #code-generation #fine-tuning

LLM Hacker News Mar 14, 2026 2 min read

Hacker News Debates Whether LLM Coding Progress Has Stalled on Maintainer Merge Rates

A Hacker News thread amplified a March 12 analysis arguing that LLM coding progress looks much weaker when measured by maintainer merge decisions rather than test-passing SWE-bench scores.

#swe-bench #coding-agents #evaluation

LLM Mar 14, 2026 2 min read

Ares Paper Shows Dynamic Reasoning Can Cut LLM Agent Tokens by Up to 52.7%

The arXiv paper Ares, submitted on March 9, 2026, proposes dynamic per-step reasoning selection for multi-step LLM agents. The authors report up to 52.7% lower reasoning token usage versus fixed high-effort settings with only minimal drops in task success.

#llm-agents #reasoning #efficiency

LLM Mar 14, 2026 2 min read

IBM Releases Granite 4.0 1B Speech for Edge-Ready Multilingual ASR and Speech Translation

IBM unveiled Granite 4.0 1B Speech on March 9, 2026 as a compact multilingual speech-language model for ASR and bidirectional speech translation. The company says it improves English transcription accuracy over its predecessor while cutting model size in half and adding Japanese support.

#ibm #granite #speech