#multimodal

AI Hacker News 1d ago 1 min read

FLUX 3 Pushes Past Image Generation Into Video, Audio, and Action

The discussion moved quickly from sample quality to the larger claim: one multimodal backbone for visual generation and action prediction.

#flux #multimodal #video-generation

LLM X/Twitter Jul 17, 2026 1 min read

Thinking Machines opens Inkling weights for multimodal reasoning

Open-weight multimodal models just gained a serious new entrant. Thinking Machines released Inkling with full weights, 64K and 256K context options, and a direct fine-tuning path through Tinker.

#thinking-machines #inkling #open-weights

LLM Hacker News Jul 16, 2026 2 min read

Inkling shifts the open-weight question toward fine-tuning

HN readers focused less on leaderboard dominance and more on the package: Thinking Machines Lab is offering a multimodal MoE with controllable reasoning effort and Tinker-based fine-tuning as an open-weight base.

#thinking-machines #open-weights #multimodal

LLM X/Twitter Jun 13, 2026 1 min read

MiniMax M3 weights hit Hugging Face with 428B total parameters

MiniMax has moved M3 from model teaser to open-weight distribution. The Hugging Face card lists about 428B total parameters, 23B activated parameters, and a 1M-token context window.

#minimax #open-weights #multimodal

LLM Hacker News Jun 4, 2026 1 min read

Gemma 4 12B puts the spotlight on encoder-free multimodal local AI

The thread’s energy centered on the architecture claim: what does “encoder-free” really mean for a 12B multimodal model?

#gemma #multimodal #open-weights

LLM X/Twitter Jun 4, 2026 1 min read

Gemma 4 12B removes separate encoders for laptop-scale multimodal AI

Local multimodal AI is moving into the 12B class. Google Gemma introduced Gemma 4 12B under Apache 2.0, describing a unified encoder-free design for image, audio, and text inputs.

#gemma #google #open-models

LLM May 29, 2026 2 min read

Gemini 3.5 Flash reaches GA as Google turns Search into an agent surface

Google’s I/O 2026 AI story is about distribution as much as models. Gemini 3.5 Flash is now generally available across API, Antigravity, Android Studio, enterprise tools, Search, and the Gemini app, while Gemini Omni Flash brings video generation into the same push.

#google #gemini #agents

AI X/Twitter May 21, 2026 1 min read

Google DeepMind Launches Gemini Omni: Generate Video from Any Input

At Google I/O 2026, Google DeepMind unveiled Gemini Omni — its first model capable of generating video from any input including text, images, audio, and video. Combining Gemini's intelligence with Google's generative media systems, it is available now through the Gemini app and YouTube Shorts.

#google #gemini #video-generation

AI Reddit May 20, 2026 1 min read

ByteDance Releases Lance: 3B Unified Multimodal Model Matching 7B Benchmarks

ByteDance Research has open-sourced Lance, a 3B-parameter unified multimodal model that handles image and video generation, editing, and understanding in a single framework. It achieves top-tier benchmark scores, matching or outperforming models twice its size.

#bytedance #lance #multimodal

AI Hacker News May 10, 2026 1 min read

Google Expands Gemini API File Search to Multimodal RAG

Google has updated the Gemini API File Search tool to support multimodal content including images, audio, and video, making it easier for developers to build efficient, verifiable RAG systems.

#google #gemini #rag

Sciences Reddit May 4, 2026 1 min read

IBM MAMMAL Beats AlphaFold 3 on 9 Biological Benchmarks with Multi-Modal Biology Model

IBM Research has published MAMMAL, a multi-modal model that unifies proteins, molecules, and gene data. It achieves state-of-the-art results on 9 of 11 biological benchmarks and outperforms AlphaFold 3 on several drug-discovery tasks.

#ibm #mammal #drug-discovery

AI Reddit May 2, 2026 1 min read

Community Discovers Claude Mythos Supports Image Generation

The r/singularity community found that Claude Mythos can generate image outputs, reportedly marking Anthropic's first foray into image generation models.

#anthropic #claude #image-generation