Azure says Phi-4-Reasoning-Vision-15B is now available in Microsoft Foundry. Microsoft positions the 15B model as a compact multimodal system that can switch reasoning on or off for document analysis, chart understanding, and GUI-grounded agent workflows.
#multimodal
Google says Cinematic Video Overviews are rolling out to NotebookLM Ultra users in English. The company says the feature combines Gemini 3, Nano Banana Pro, and Veo 3 to generate more immersive videos than the earlier narrated-slide format.
Mistral has launched Mistral 3, a new open multimodal family with dense 14B, 8B, and 3B models under Apache 2.0, plus a larger Mistral Large 3. The company says the lineup was trained from scratch and tuned for both Blackwell NVL72 systems and single-node 8xA100 or 8xH100 deployments.
Google AI shared practical Gemini 3.1 Flash-Lite examples, including high-volume image sorting and business automation scenarios. The thread also points developers to preview access via Gemini API, Google AI Studio, and Vertex AI.
Google announced Nano Banana 2 on X, describing it as its best image generation and editing model so far. The rollout note says availability is expanding across Gemini App, Search, and Google’s developer and creativity tools.
A high-engagement LocalLLaMA post on March 4, 2026 discussed Microsoft’s open-weight Phi-4-Reasoning-Vision-15B and focused on practical deployment tradeoffs for local multimodal inference.
Google has released Nano Banana 2, a new AI image generation model combining advanced world knowledge, subject consistency, and production-ready specs at Flash speed. The release signals Google's push to compete directly in enterprise image generation.
Google has released Nano Banana 2, a new AI image generation model combining advanced world knowledge, subject consistency, and production-ready specs at Flash speed. The release signals Google's push to compete directly in enterprise image generation.
On February 26, 2026 (UTC), Google DeepMind said on X that Nano Banana 2 can turn instructions into data-rich infographics and educational diagrams. The post also emphasized Gemini world knowledge and real-time web-grounded generation.
A widely upvoted Reddit post highlighted Google’s new Nano Banana 2 (Gemini 3.1 Flash Image), which combines Pro-level image capabilities with faster generation and broad product/API rollout.
A post on r/MachineLearning has resonated deeply: an independent researcher with limited compute developed a genuinely novel multimodal learning improvement but had their paper rejected from CVPR primarily because they could not afford to run comparisons against large-scale models.
A high-scoring r/LocalLLaMA thread surfaced Qwen3.5-397B-A17B, an open-weight multimodal model card on Hugging Face that lists 397B total parameters with 17B activated and up to about 1M-token extended context.