#multimodal

LLM sources.twitter Mar 25, 2026 1 min read

NVIDIA, Nemotron Nano 12B v2 VL을 온프레미스 영상 이해용 경량 오픈 모델로 전면 배치

NVIDIA는 2026년 3월 25일 Nemotron Nano 12B v2 VL이 온프레미스 video understanding을 지원하며, 자사 설명 기준으로 MediaPerf benchmark에서 30B급 대안에 가까운 성능을 더 작은 footprint로 낸다고 밝혔다. NVIDIA 모델 카드는 이를 multi-image reasoning, video understanding, visual Q&A, summarization을 위한 상용 가능 멀티모달 모델로 소개한다.

#nvidia #nemotron #multimodal

LLM Mar 24, 2026 1 min read

Microsoft Research, Phi-4-reasoning-vision-15B 공개… multimodal reasoning 효율성 전면에

Microsoft Research가 2026년 3월 4일 15 billion parameter open-weight 모델 Phi-4-reasoning-vision-15B를 공개했다. 회사는 이 모델이 multimodal reasoning, math·science task, computer-use scenario에서 경쟁력 있는 성능을 내면서도 compute cost를 낮추는 데 초점을 맞췄다고 설명했다.

#microsoft #phi-4 #multimodal

LLM sources.twitter Mar 24, 2026 1 min read

OpenAI, GPT-5.4 mini를 ChatGPT, Codex, API에 확대

OpenAI는 2026년 3월 17일 X에서 GPT-5.4 mini가 ChatGPT, Codex, API에 출시됐다고 밝혔다. 회사는 mini를 더 빠른 coding과 multimodal 작업용 모델로 소개했고, 함께 공개한 공식 글에서는 API 전용 GPT-5.4 nano도 추가했다.

#openai #gpt-5.4 #chatgpt

LLM sources.twitter Mar 22, 2026 1 min read

Google, Gemini Embedding 2 공개… 텍스트·이미지·오디오·비디오·문서를 하나의 벡터 공간으로

Google AI Studio는 2026-03-12 X 게시물에서 Gemini Embedding 2를 소개했고, Google의 2026-03-10 블로그 글은 이 model이 text, images, video, audio, documents를 하나의 embedding space로 매핑한다고 설명한다. Google은 이 model이 Gemini API와 Vertex AI에서 public preview로 제공되며 multimodal retrieval과 classification을 주요 활용처로 내세운다고 밝혔다.

#google #gemini #embeddings

LLM Mar 22, 2026 1 min read

OpenAI, GPT-5.4 mini·nano 공개… coding·subagent용 소형 모델 라인업 확장

OpenAI가 2026년 3월 17일 GPT-5.4 mini와 nano를 공개했다. 회사는 두 모델을 coding, tool use, multimodal reasoning, high-volume subagent workload에 맞춘 저지연 소형 모델로 설명했다.

#openai #gpt-5.4 #coding

LLM Reddit Mar 19, 2026 1 min read

LocalLLaMA가 본 Mistral Small 4, Instruct·Reasoning·Devstral을 하나의 MoE로 접다

2026년 3월 16일 r/LocalLLaMA의 Mistral Small 4 글은 최신 사용 가능 크롤 기준 606 points와 232 comments를 기록했다. Mistral 모델 카드는 4 active expert, 256k context, 멀티모달 입력, 요청별 reasoning 전환을 갖춘 119B급 MoE를 설명한다.

#mistral #multimodal #reasoning

LLM sources.twitter Mar 17, 2026 1 min read

Google DeepMind, Gemini Embedding 2 preview 공개로 multimodal retrieval 확장

Google DeepMind는 X에서 Gemini Embedding 2를 Gemini API와 Vertex AI를 통해 preview로 제공한다고 밝혔다. 이 모델은 Gemini architecture 기반의 첫 fully multimodal embedding model로, text·image·video·audio·documents 검색 계층을 하나로 묶는 것을 목표로 한다.

#google-deepmind #gemini #embeddings

LLM sources.twitter Mar 17, 2026 1 min read

OpenAI, GPT-5.4 mini·nano로 소형 모델 스택 확대

OpenAI는 X에서 GPT-5.4 mini를 ChatGPT·Codex·API에 투입하고, GPT-5.4 nano를 저비용 API 워크로드용 소형 모델로 내놓는다고 밝혔다. 회사는 두 모델을 coding, multimodal 작업, agent 하위 워크플로우를 위한 더 빠른 소형 모델로 포지셔닝하고 있다.

#openai #gpt-5.4 #codex

LLM sources.twitter Mar 17, 2026 1 min read

Mistral AI, NVIDIA와 open frontier models 공동 개발… Nemotron Coalition 합류

Mistral AI는 2026년 3월 16일 NVIDIA와 frontier open-source AI models를 공동 개발하는 전략적 파트너십에 들어간다고 밝혔다. 이어진 Mistral 공식 글은 Mistral이 NVIDIA Nemotron Coalition의 founding member로 참여하며 large-scale model development와 multimodal capabilities를 제공한다고 설명한다.

#mistral #nvidia #open-models

LLM Reddit Mar 17, 2026 1 min read

r/LocalLLaMA가 밀어올린 Mistral Small 4, 119B MoE에 256k context·reasoning mode 결합

2026년 3월 16일 r/LocalLLaMA에서 Mistral Small 4 링크는 504 points와 196 comments를 기록했다. Hugging Face model card에 따르면 이 모델은 119B parameter, 4 active experts, 256k context, multimodal input, switchable reasoning을 한 번에 묶는다.

#mistral #open-models #multimodal

LLM Mar 16, 2026 1 min read

Google, Gemini Embedding 2 public preview 시작... 첫 natively multimodal embedding 모델

Google은 2026년 3월 10일 Gemini Embedding 2를 public preview로 공개했다. 회사는 이 모델이 text, image, 그리고 PDF 같은 mixed multimodal 문서를 하나의 embedding space에서 처리하며, benchmark score를 68.32와 53.3까지 끌어올리면서도 가격과 차원 수는 유지한다고 밝혔다.

#google #embeddings #multimodal

AI Reddit Mar 14, 2026 2 min read

r/singularity, Meituan의 8-step open-source image editing 모델 LongCat-Image-Edit-Turbo 주목

r/singularity는 Meituan의 LongCat-Image-Edit-Turbo를 조명했다. 이 모델은 단 8 NFEs로 high-quality 결과를 내세우는 distilled open-source image editor이며, Apache 2.0 Hugging Face 모델과 공개 arXiv 보고서, 그리고 benchmark framing에 대한 커뮤니티 검증이 함께 따라붙고 있다.

#meituan #image-editing #open-source