OpenAI has recently released their two new AI reasoning models—o3 and o4-mini—developed to handle complex tasks with a focus on deeper cognitive processing.
Highlights
These models are engineered to “pause and think” before producing output, marking a shift toward more intentional and layered responses in artificial intelligence systems.
With enhanced performance in coding, mathematics, science, and image understanding, the launch of these models reflects an evolving approach to multimodal and analytical AI.
The o3 model is currently OpenAI’s most advanced reasoning system.
It has demonstrated strong performance across industry benchmarks, including a 69.1% score on the SWE-bench Verified benchmark, which measures the ability to resolve real-world software engineering tasks without custom scaffolding.
This score compares to 49.3% for o3-mini and surpasses Claude 3.7 Sonnet’s 62.3%, indicating notable improvements in reasoning capability. Meanwhile, o4-mini achieved a 68.1% score on the same benchmark, delivering comparable accuracy at a lower operational cost.
Intentional Reasoning and Stepwise Output
Unlike traditional models that generate outputs reactively, o3 and o4-mini incorporate deliberate reasoning steps, allowing them to handle more intricate queries.
OpenAI has also introduced o4-mini-high, a variant of o4-mini that takes more time per output in exchange for improved reliability—particularly useful in critical workflows where accuracy is paramount.
Multimodal Reasoning Capabilities
These models support multimodal input, including text and images. Users can provide diagrams, handwritten notes, or low-quality visual content, which the models can interpret and integrate into their reasoning process.
This allows them to analyze layouts, rotate images, and draw conclusions from visuals in a way that complements text-based understanding.
Integrated Tools and Real-Time Functionality
OpenAI has expanded the functionality of these models through integrations available in ChatGPT’s Canvas interface. Features include:
- Web browsing for retrieving live data
- Python code execution
- Image and file analysis
- Image generation
- Long-term memory features
These capabilities enable the models to assist in dynamic environments, enhancing their utility in technical and research-heavy tasks.
Benchmark Performance
In standardized evaluations, o3 has shown considerable progress:
- 87.7% on GPQA Diamond, an expert-level science benchmark
- 71.7% on SWE-bench Verified, compared to 48.9% from OpenAI’s earlier o1 model
- Threefold improvement on ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence), compared to o1
Such improvements reflect the model’s capacity for structured and abstract problem-solving, aligning more closely with human-style reasoning.
Deployment and Access
The models are now available to:
- ChatGPT Pro, Plus, and Team users
- Developers via the Chat Completions API and Responses API
Pricing for usage has been set as follows:
Model | Input Token Cost | Output Token Cost |
---|---|---|
o3 | $10 / million | $40 / million |
o4-mini | $1.10 / million | $4.40 / million |
This pricing structure allows developers to choose between high-end performance (o3) and more cost-efficient alternatives (o4-mini) depending on their project needs.
Strategic Release Decisions
Although OpenAI originally planned to incorporate o3 technology into a broader release—likely GPT-5—the current market pace influenced the decision to launch o3 and o4-mini as standalone models.
Internal discussions considered holding back for a more integrated rollout, but growing momentum among competitors (Anthropic, Meta, DeepSeek, and Google) played a role in the accelerated deployment.
Upcoming Developments
OpenAI has confirmed that o3-pro, a more resource-intensive variant of o3, will be released soon and will be exclusive to ChatGPT Pro users.
This version is expected to further advance reasoning performance and may serve as a stepping stone toward GPT-5, a unified model combining traditional and reasoning-oriented architectures.