Meta has announced the launch of Llama 4, its latest suite of open-weight artificial intelligence models.
Highlights
The release includes three models—Llama 4 Scout, Maverick, and the still-training Behemoth—each designed to expand the Llama model family with improvements in performance, multimodal capabilities, and handling of complex tasks across a wide range of domains.
The models were unveiled over a weekend, signaling a strategic move to respond quickly to global developments in the AI space.
Trained on large volumes of unlabeled text, images, and videos, the Llama 4 series introduces Meta’s first use of the Mixture of Experts (MoE) architecture.
This advanced structure distributes computational workloads across specialized sub-models to optimize performance and efficiency. For instance, Maverick is built with 400 billion total parameters but uses only 17 billion per inference, thanks to 128 expert modules.
Model Overview and Technical Capabilities
- Scout: A lightweight model designed for summarization, long-context reasoning, and document analysis. It supports a 10 million-token context window, allowing it to process extensive codebases or texts efficiently, even on a single Nvidia H100 GPU.
- Maverick: A general-purpose assistant with strengths in multilingual and creative tasks. It requires a more advanced deployment setup, such as a full Nvidia H100 DGX system.
- Behemoth: Still in training, this model is expected to be Meta’s largest and most capable to date, with 288 billion active parameters and nearly two trillion total. Preliminary internal benchmarks suggest strong performance in STEM domains, with competitive results against models like GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro.
Meta’s internal evaluations indicate that while Maverick performs competitively against many current flagship models, it does not consistently surpass the latest releases from competitors such as Google and OpenAI.
Enhancements in Llama 4 focus not only on performance but also on how the models handle politically sensitive or ideologically charged queries.
Shifts in Content Moderation and Model Alignment
In contrast to earlier versions, the Llama 4 models have been adjusted to reduce refusals to engage with contentious topics.
Meta describes these changes as efforts to ensure the models remain more responsive and balanced, aiming to provide factual, neutral responses without avoiding difficult questions.
The company emphasizes that this approach is part of its broader commitment to minimizing perceived ideological bias.
Licensing and Distribution Restrictions
With the launch of Llama 4, Meta has also introduced stricter licensing terms. The models are restricted from use or distribution within the European Union, likely due to ongoing concerns regarding the region’s AI governance and data privacy frameworks.
Additionally, companies with more than 700 million monthly active users must seek a special license from Meta to access the models, with approvals granted at the company’s discretion.
The models are accessible via Llama.com and platforms such as Hugging Face. Meta AI, the company’s assistant integrated into WhatsApp, Messenger, and Instagram, has already incorporated Llama 4 in more than 40 countries.
However, advanced multimodal capabilities, such as image and video comprehension, are currently limited to U.S.-based users and only available in English.
Development Motivations and Competitive Context
Meta’s development timeline for Llama 4 appears to have been influenced by rising global competition—particularly the emergence of DeepSeek, a Chinese AI developer whose models have received attention for their efficiency and capabilities.
In response, Meta is reported to have formed internal “war rooms” to analyze and replicate aspects of DeepSeek’s performance strategies.
Infrastructure and Investment
The training of Llama 4 relied on a record-breaking infrastructure of over 100,000 Nvidia H100 GPUs, highlighting the scale and ambition of Meta’s AI efforts.
In 2024 alone, the company’s infrastructure spending is projected to reach $40 billion, representing a 42% increase from the previous year. This investment reflects Meta’s long-term commitment to AI leadership and large-scale model development.
Organizational Changes
Amid these developments, Joelle Pineau, head of Meta’s AI research division, has announced her resignation effective May 30, 2025.
Pineau played a key role in the development of foundational tools like the Llama model family, and her departure marks a notable leadership transition during a period of rapid technological advancement for the company.
Model Limitations
Despite its strengths, Llama 4 does not yet include OpenAI-style “reasoning layers,” which are designed to enhance factual accuracy and answer reliability.
Nonetheless, improvements in responsiveness, scalability, and context handling suggest that Meta is positioning the Llama series for increasingly dynamic, real-time interactions.