A developer on r/MachineLearning shared phase-one details for Dante-2B, a 2.1B Italian/English model trained from scratch with a tokenizer tuned for Italian morphology and token efficiency.
LLM
RSS FeedGitHub said that starting April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train and improve AI models unless users opt out. Business and Enterprise plans are excluded, but the change materially expands how individual-tier Copilot usage can feed back into model development.
Google Cloud Tech highlighted BigQuery’s autonomous embedding generation preview on April 10, 2026, positioning it as a way to keep vector data in sync without separate ETL glue. The documentation shows automatically maintained embedding columns backed by Vertex AI models, plus a preview built-in model path inside BigQuery.
On April 10, 2026, Databricks AI Research published Memory Scaling for AI Agents, arguing that agent performance can improve as external memory grows. The post reports gains in both accuracy and efficiency from labeled examples, raw conversation logs, and organizational knowledge.
Claude said on April 10, 2026 that Claude for Word is now in beta for Team and Enterprise plans. The add-in drafts, edits, and revises Word files from a sidebar while preserving formatting and returning reviewable tracked changes.
A high-engagement LocalLLaMA post shared reproducible benchmark data showing Qwen3.5-122B NVFP4 decoding around 198 tok/s on a dual RTX PRO 6000 Blackwell system using SGLang b12x+NEXTN and a PCIe switch topology.
vLLM said NVIDIA used the framework for the first MLPerf vision-language benchmark submission built on Qwen3-VL. NVIDIA’s accompanying blog places that result inside a broader Blackwell Ultra push that claims up to 2.7x throughput gains and more than 60% lower token cost on the same infrastructure for some workloads.
A high-scoring LocalLLaMA thread treated merged PR #19378 as a meaningful step toward more practical multi-GPU inference in llama.cpp. The catch is that the new <code>--split-mode tensor</code> path is still explicitly experimental, strongest today on CUDA, and still rough on ROCm and Vulkan.
A Hacker News discussion focused on SkyPilot's argument that coding agents work better when they read papers and competing implementations before editing code. In the reported llama.cpp experiments, that research-first loop produced 5 viable optimizations and improved TinyLlama text generation by 15% on x86 and 5% on ARM for about $29.
On April 9, 2026, Google DeepMind said on X that Gemma 4 crossed 10M downloads in its first week and that the Gemma family overall has topped 500M downloads. Google positions Gemma 4 as an open model family built for reasoning, agentic workflows, and efficient deployment on local hardware.
On April 8, 2026, Anthropic highlighted a new engineering post describing Managed Agents, its hosted service for long-running agent work on the Claude Platform. Anthropic says the system separates session, harness, and sandbox layers so agents can recover more cleanly from failure and connect to customer infrastructure with fewer assumptions.
On April 9, 2026, OpenAI said on X that it is introducing a new $100/month ChatGPT Pro tier aimed at heavier Codex use. OpenAI says the existing $200 Pro tier will remain the highest-usage option while Plus usage is being rebalanced toward more sessions across a week.