LLM

LLM Reddit Apr 11, 2026 2 min read

Dante-2B pitches an Italian-first open model instead of an English-first fine-tune

A developer on r/MachineLearning shared phase-one details for Dante-2B, a 2.1B Italian/English model trained from scratch with a tokenizer tuned for Italian morphology and token efficiency.

#llm #tokenizer #multilingual

LLM Apr 11, 2026 2 min read

GitHub will use more Copilot interaction data for model training by default

GitHub said that starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve AI models unless users opt out. Business and Enterprise plans are excluded, but the change materially expands how individual-tier Copilot usage can feed back into model development.

#github #copilot #privacy

LLM X/Twitter Apr 10, 2026 2 min read

Google Cloud brings autonomous embedding generation to BigQuery in preview

Google Cloud Tech highlighted BigQuery’s autonomous embedding generation preview on April 10, 2026, positioning it as a way to keep vector data in sync without separate ETL glue. The documentation shows automatically maintained embedding columns backed by Vertex AI models, plus a preview built-in model path inside BigQuery.

#google-cloud #bigquery #embeddings

LLM X/Twitter Apr 10, 2026 2 min read

Databricks argues memory, not reasoning alone, is the next scaling bottleneck for AI agents

On April 10, 2026, Databricks AI Research published Memory Scaling for AI Agents, arguing that agent performance can improve as external memory grows. The post reports gains in both accuracy and efficiency from labeled examples, raw conversation logs, and organizational knowledge.

#databricks #ai-agents #memory

LLM X/Twitter Apr 10, 2026 2 min read

Claude for Word brings tracked-change editing into Microsoft documents

Claude said on April 10, 2026 that Claude for Word is now in beta for Team and Enterprise plans. The add-in drafts, edits, and revises Word files from a sidebar while preserving formatting and returning reviewable tracked changes.

#anthropic #claude #microsoft-word

LLM Reddit Apr 10, 2026 2 min read

LocalLLaMA Benchmarks Qwen3.5-122B at 198 tok/s on Dual RTX PRO 6000 Blackwell

A high-engagement LocalLLaMA post shared reproducible benchmark data showing Qwen3.5-122B NVFP4 decoding around 198 tok/s on a dual RTX PRO 6000 Blackwell system using SGLang b12x+NEXTN and a PCIe switch topology.

#qwen #blackwell #inference

LLM X/Twitter Apr 10, 2026 1 min read

vLLM Lands in the First MLPerf Vision-Language Benchmark Submission

vLLM said NVIDIA used the framework for the first MLPerf vision-language benchmark submission built on Qwen3-VL. NVIDIA’s accompanying blog places that result inside a broader Blackwell Ultra push that claims up to 2.7x throughput gains and more than 60% lower token cost on the same infrastructure for some workloads.

#vllm #mlperf #benchmark

LLM Reddit Apr 10, 2026 2 min read

Reddit Welcomes llama.cpp Tensor Parallelism, With an Experimental Warning Label

A high-scoring LocalLLaMA thread treated merged PR #19378 as a meaningful step toward more practical multi-GPU inference in llama.cpp. The catch is that the new <code>--split-mode tensor</code> path is still explicitly experimental, strongest today on CUDA, and still rough on ROCm and Vulkan.

#llama-cpp #tensor-parallelism #multi-gpu

LLM Hacker News Apr 10, 2026 2 min read

Hacker News Zeroes In on Research-Driven Coding Agents

A Hacker News discussion focused on SkyPilot's argument that coding agents work better when they read papers and competing implementations before editing code. In the reported llama.cpp experiments, that research-first loop produced 5 viable optimizations and improved TinyLlama text generation by 15% on x86 and 5% on ARM for about $29.

#coding-agents #llama-cpp #skypilot

LLM X/Twitter Apr 9, 2026 1 min read

Google DeepMind says Gemma 4 passed 10M downloads in its first week

On April 9, 2026, Google DeepMind said on X that Gemma 4 crossed 10M downloads in its first week and that the Gemma family overall has topped 500M downloads. Google positions Gemma 4 as an open model family built for reasoning, agentic workflows, and efficient deployment on local hardware.

#google-deepmind #gemma #open-models

LLM X/Twitter Apr 9, 2026 1 min read

Anthropic explains Managed Agents architecture for long-running Claude workloads

On April 8, 2026, Anthropic highlighted a new engineering post describing Managed Agents, its hosted service for long-running agent work on the Claude Platform. Anthropic says the system separates session, harness, and sandbox layers so agents can recover more cleanly from failure and connect to customer infrastructure with fewer assumptions.

#anthropic #managed-agents #agents

LLM X/Twitter Apr 9, 2026 1 min read

OpenAI adds a $100 ChatGPT Pro tier and reshapes Codex usage limits

On April 9, 2026, OpenAI said on X that it is introducing a new $100/month ChatGPT Pro tier aimed at heavier Codex use. OpenAI says the existing $200 Pro tier will remain the highest-usage option while Plus usage is being rebalanced toward more sessions across a week.

#openai #chatgpt #codex