Together AI said on March 19, 2026 that its fine-tuning service now supports tool-call, reasoning, and vision-language workflows. The linked Together AI blog adds 100B+ parameter model support, datasets up to 100GB, up to 6x higher throughput on large MoE models, and upfront cost plus ETA estimates.
LLM
RSS FeedA benchmark thread on r/LocalLLaMA compared ROCm 7 nightlies and Vulkan on an AMD Mi50 for llama.cpp, arguing that Vulkan wins short dense workloads while ROCm pulls ahead on long context and some MoE scenarios.
A Hacker News discussion highlighted Flash-MoE, a pure C/Metal inference stack that streams Qwen3.5-397B-A17B from SSD and reaches interactive speeds on a 48GB M3 Max laptop.
On Feb. 19, 2026, Google introduced Gemini 3.1 Pro and began rolling it out across AI Studio, Gemini CLI, Antigravity, Android Studio, Vertex AI, Gemini Enterprise, the Gemini app, and NotebookLM. Google says the model reached 77.1% on ARC-AGI-2, more than doubling Gemini 3 Pro’s reasoning performance on that benchmark.
A rerun benchmark posted to r/LocalLLaMA argues that Apple’s M5 Max shows its clearest gains on prompt processing rather than raw generation alone. The post reports 2,845 tok/s PP512 for Qwen 3.5 35B-A3B MoE and 92.2 tok/s generation, but these remain community measurements rather than independent lab benchmarks.
A Show HN post points to llm-circuit-finder, a toolkit that duplicates selected transformer layers inside GGUF models and claims sizable reasoning gains without changing weights or running fine-tuning. The strongest benchmark numbers come from the project author’s own evaluations rather than independent validation.
OpenCode drew 1,238 points and 614 comments on Hacker News, highlighting an open-source AI coding agent that spans terminal, IDE, and desktop clients. The project site emphasizes broad provider support, LSP integration, multi-session workflows, and a privacy-first posture.
A Reddit post in r/LocalLLaMA introduces a GGUF release of Qwen3.5-122B-A10B Uncensored (Aggressive) alongside new K_P quants. The author claims 0/465 refusals and zero capability loss, but those results are presented as the author’s own tests rather than independent verification.
GitHub has launched a public preview that lets teams assign Jira issues directly to the Copilot coding agent and receive AI-generated draft pull requests in GitHub. The company says the integration reduces context switching while preserving existing review and approval controls.
Google has introduced Gemini 3.1 Flash-Lite in preview through Google AI Studio and Vertex AI. The company is positioning it as the fastest and most cost-efficient model in the Gemini 3 family for large-scale inference jobs.
OpenAI Developers announced on March 20, 2026 that verified university students in the United States and Canada can claim $100 in Codex credits. OpenAI’s support page says that equals 2,500 ChatGPT credits, requires student verification through SheerID, and expires 12 months after the grant date.
Google AI Studio promoted Gemini Embedding 2 in a March 12, 2026 X post, and Google’s March 10 blog post says the model maps text, images, video, audio, and documents into a single embedding space. Google says it is in public preview through the Gemini API and Vertex AI and is designed for multimodal retrieval and classification.