OpenAI has released two new open-weight large language models — gpt-oss-120b and gpt-oss-20b — marking its first major open-source release since GPT-2 over five years ago.
Highlights
- First open-weight release in 5+ years: OpenAI publishes gpt-oss-120b and gpt-oss-20b under Apache 2.0, allowing full commercial use.
- Scalable across hardware tiers: 120b is optimized for a single H100 GPU, while 20b runs on laptops with 16GB RAM — democratizing access.
- Mixture-of-Experts (MoE) architecture: Both models activate a subset of parameters per token for efficient inference and reasoning.
- Competitive performance: Outperforms DeepSeek’s R1 and Qwen in some benchmarks, though still behind OpenAI’s own o-series models.
- High hallucination rates: The 20b and 120b models exhibit 49–53% hallucination in PersonQA — a tradeoff for openness and accessibility.
- Agentic capabilities: Supports tool use, chain-of-thought prompting, structured outputs, and adjustable reasoning — ideal for building autonomous AI agents.
- Safety-first release: Underwent evaluations using OpenAI’s Preparedness Framework, showing insufficient risk to restrict deployment.
- Available via major platforms: Deployable through Hugging Face, AWS Bedrock, Azure AI Foundry, and SageMaker.
- Strategic shift: Signals OpenAI’s response to global competition and a pivot back toward open-source collaboration.
- Altman’s new stance: CEO Sam Altman admits OpenAI was “on the wrong side of history” regarding openness — now aiming to empower developers globally.
Both models are now available for download via Hugging Face under the permissive Apache 2.0 license, allowing full commercial use.
Differences and Capabilities
The two models differ in scale, hardware requirements, and intended use.
- gpt-oss-120b is a large-scale model designed to run efficiently on a single NVIDIA H100 GPU. It leverages Mixture-of-Experts (MoE) architecture with 5.1B active parameters per token, enabling strong performance across reasoning tasks with efficient inference.
- gpt-oss-20b, a smaller variant, is optimized to run on consumer-grade hardware — such as laptops with 16GB RAM — making advanced reasoning AI more accessible to individual developers and smaller teams.
Benchmark Performance and Limitations
OpenAI reports that both models perform competitively with existing open-weight models. For example:
- On Codeforces with tools, the 120b and 20b scored 2622 and 2516, respectively — outperforming DeepSeek’s R1 model.
- In Humanity’s Last Exam, both models outperformed Qwen and DeepSeek but remained below the performance of OpenAI’s o-series models like o3 and o4-mini.
The models also demonstrate higher hallucination rates compared to OpenAI’s proprietary systems.
On PersonQA, gpt-oss-120b and gpt-oss-20b showed hallucination rates of 49% and 53%, respectively — significantly above the o1 model’s 15% and o4-mini’s 36%.
Architecture and Use in Agentic Workflows
Both models employ Mixture-of-Experts (MoE) design, which activates only a subset of parameters per token to reduce compute overhead.
- gpt-oss-120b: ~5.1B active parameters per token
- gpt-oss-20b: ~3.6B active parameters per token
The models support advanced reasoning workflows, including,
- Chain-of-thought prompting
- Tool use, such as code execution and web browsing
- Structured output generation
- Adjustable reasoning effort
Safety Measures and Responsible Release
Prior to release, OpenAI conducted extensive internal and third-party evaluations under its Preparedness Framework, testing for misuse in high-risk domains such as cybersecurity and biotechnology.
The results indicated that the models do not meet the criteria for “high capability” in dangerous applications, enabling OpenAI to proceed with open-weight deployment while maintaining its safety commitments.
Accessibility and Deployment Options
Beyond open access on Hugging Face, the gpt-oss models are also being integrated into cloud platforms.
- AWS Bedrock
- Azure AI Foundry
- SageMaker
OpenAI claims that gpt-oss-120b is up to 3× more cost-efficient than competitors like Gemini or DeepSeek’s R1 on AWS infrastructure — a potentially significant advantage for enterprises seeking scalable, transparent AI solutions.
The release comes at a time of increasing pressure from global AI labs — particularly in China — that are rapidly advancing open-weight models. DeepSeek’s R1, Alibaba’s Qwen, and Moonshot AI have all made substantial progress, prompting OpenAI to revisit its approach to openness.
CEO Sam Altman acknowledged this shift in direction, stating earlier this year that the company had been “on the wrong side of history” regarding open-source AI.
He emphasized OpenAI’s renewed commitment to its mission of ensuring that AGI benefits all of humanity by enabling broader developer access.
“We are excited for the world to be building on an open AI stack created in the United States, based on democratic values,” Altman said in a recent statement.