Meta Llama 4 Ushers in Native Multimodal AI Era with 10M Token Context
Native Multimodal Innovation
Meta has set a new milestone in the AI industry with the announcement of the Llama 4 series. Llama 4 Scout and Llama 4 Maverick are the first open-weight natively multimodal models, designed from the ground up to process text, images, and video in an integrated manner.
Llama 4 Maverick: 17B Parameter Powerhouse
Llama 4 Maverick is Meta's first model using a Mixture-of-Experts (MoE) architecture, with 17 billion active parameters and 128 experts.
It beats GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, proving itself as the best multimodal model in its class.
Llama 4 Scout: 10 Million Token Context
Llama 4 Scout dramatically increases the supported context length from 128K tokens in Llama 3 to an industry-leading 10 million tokens. This means it can process hundreds of pages of documents, hours of video content, or massive codebases in a single context.
Significance of Open-Weight Strategy
Meta has released Llama 4 as an open-weight model, allowing researchers and developers to freely use and improve it. This represents a major differentiator in terms of transparency and accessibility compared to commercial closed models (GPT, Claude, Gemini).
Impact on AI Ecosystem
The arrival of Llama 4 signifies the democratization of multimodal AI. Multimodal capabilities previously available only from major tech companies like OpenAI, Google, and Anthropic are now accessible to anyone for use and customization.
The introduction of MoE architecture is also significant for efficiency. It reduces computational costs by activating only necessary experts while maintaining performance.
Related Articles
Meta가 최초의 오픈 웨이트 네이티브 멀티모달 모델인 Llama 4 Scout와 Maverick을 공개했습니다. MoE 아키텍처를 채택했으며 GPT-4.5를 능가하는 성능을 자랑합니다.
Meta가 Llama 파생 모델을 배포하던 오픈소스 프로젝트 Heretic에 법적 통보를 보냈습니다. Heretic은 갈릴레오 재판을 인용하는 풍자적 공개 성명을 발표하면서 독일 Codeberg에 미러를 설치했습니다.
로컬 멀티모달 모델 경쟁이 12B급으로 좁혀졌다. Google Gemma는 Gemma 4 12B를 Apache 2.0으로 공개하며 이미지·오디오 입력을 별도 인코더 없이 처리한다고 밝혔다.