LocalLLaMA reacted like dense models had suddenly become fun again. The official Qwen numbers were strong, but the real community energy came from people immediately asking about quants, GGUF builds, and whether 27B had become the practical sweet spot. By crawl time on April 25, 2026, the thread had 1,688 points and 603 comments.
#multimodal
RSS FeedWhy it matters: retrieval stacks are being pulled from text-only search into multimodal memory. Google AI Studio said Gemini Embedding 2 is generally available and covers text, image, video, audio, and documents through one model path.
Why it matters: Anthropic is moving Claude into visual work products, not just text and code. The tweet says Claude Design is powered by Opus 4.7 and is rolling out in research preview to Pro, Max, Team, and Enterprise plans.
MM-WebAgent tackles a real flaw in AI-made webpages: models can generate pieces, but the page often loses visual coherence. The paper adds hierarchical planning, self-reflection, a benchmark, and released code/data so builders can test multimodal webpage agents beyond code-only output.
Why it matters: Alibaba is putting a small-active-parameter multimodal coding model into open weights rather than keeping it API-only. The tweet says Qwen3.6-35B-A3B has 35B total parameters, 3B active parameters, and an Apache 2.0 license; the blog reports 73.4 on SWE-bench Verified and 51.5 on Terminal-Bench 2.0.
A 54-point Reddit post flagged merged PR #19441 as the moment qwen3-omni-moe and qwen3-asr support reached llama.cpp, with commenters focused on local multimodal and ASR use cases.
AI at Meta said on April 8, 2026 that Muse Spark is a natively multimodal reasoning model with tool use, visual chain of thought, and multi-agent orchestration. Meta's official announcement says it already powers the Meta AI app and meta.ai, is rolling out across WhatsApp, Instagram, Facebook, Messenger and AI glasses, and is entering private-preview API access for selected partners.
Meta introduced Muse Spark on April 8, 2026 as the first model from Meta Superintelligence Labs. It already powers the Meta AI app and website and will expand to WhatsApp, Instagram, Facebook, Messenger, and AI glasses, with a private-preview API for partners.
Google said on March 26, 2026 that Search Live is expanding to every language and country where AI Mode is already available. The rollout reaches more than 200 countries and territories and uses Gemini 3.1 Flash Live to make search more conversational, voice-first, and camera-aware.
A Hacker News thread amplified Meta's launch of Muse Spark, a multimodal reasoning model with tool use, visual chain of thought, and a parallel-agent Contemplating mode.
A recent Show HN thread pointed to Parlor, a local multimodal assistant that combines Gemma 4 E2B, Kokoro, browser voice activity detection, and streaming audio playback. The project reports around 2.5 to 3.0 seconds of end-to-end latency on an Apple M3 Pro.
A LocalLLaMA demo pointed to Parlor, which runs speech and vision understanding with Gemma 4 E2B and uses Kokoro for text-to-speech, all on-device. The README reports roughly 2.5-3.0 seconds end-to-end latency and about 83 tokens/sec decode speed on an Apple M3 Pro.