OpenAI has introduced a new pricing tier called Flex processing, aimed at developers handling lower-priority tasks where speed and consistent availability are not critical.
Highlights
This beta feature is now available for the recently released o3 and o4-mini models and is positioned as a cost-effective option for non-production workloads.
By offering reduced rates for slower response times, OpenAI is targeting developers seeking to optimize costs while accessing advanced AI capabilities.
Designed for Non-Critical Workloads
Flex is intended for scenarios where immediate responsiveness is not essential—such as model evaluations, asynchronous data handling, and AI experimentation.
OpenAI emphasizes that this option is tailored for “non-production” environments, where performance trade-offs can be tolerated in exchange for lower costs.
Developers using Flex access the same models as those in standard tiers, but with lower system priority. This means responses may be delayed or, at times, temporarily unavailable, depending on platform load.
Significant Reduction in Pricing
The new tier offers notable savings. For the o3 model:
- Input tokens are priced at $5 per million (down from $10).
- Output tokens cost $20 per million (down from $40).
For the o4-mini model:
- Input tokens are reduced to $0.55 per million.
- Output tokens are priced at $2.20 per million.
These changes represent a 50% discount compared to standard API pricing, making the Flex tier an attractive option for budget-conscious developers.
Competitive Timing in the AI Landscape
The launch of Flex comes amid increasing competition among AI providers to offer more affordable and scalable solutions.
On the same day, Google introduced Gemini 2.5 Flash, a lightweight reasoning model that competes in terms of both performance and cost-efficiency.
OpenAI’s Flex rollout appears strategically timed to appeal to developers managing large-scale workloads, where cost control is a primary concern.
ID Verification for Access
To access Flex and newer models like o3, OpenAI now requires ID verification for users in usage tiers 1 through 3, which includes developers with lower overall API spend.
This verification step is part of OpenAI’s broader efforts to prevent misuse and enforce responsible usage. Verified users will also receive access to additional features, including streaming responses and reasoning summaries.
Multimodal Reasoning Capabilities
The o3 and o4-mini models bring enhanced multimodal reasoning capabilities. These models can interpret visual inputs such as sketches and whiteboard images, integrating them into broader analytical processes.
They support functions like zooming, rotating, and analyzing visual data, making them well-suited for more complex, hybrid workflows.
Full Tool Integration
Both models are compatible with the complete suite of ChatGPT tools, including:
- Web browsing
- Python execution
- Image analysis and generation
- File interpretation
This deep integration allows developers to handle diverse tasks without switching platforms, improving efficiency and workflow consistency.
Market Positioning
The Flex processing tier reflects OpenAI’s strategy to diversify its offerings in response to competitive pressures and varied developer needs.
By introducing a pricing model geared toward cost-sensitive, non-critical tasks, OpenAI aims to expand its user base while maintaining high-performance resources for mission-critical applications.
Resource Optimization and Developer Flexibility
Flex processing offers a clear tradeoff: reduced cost in exchange for lower processing priority.
For many developers—particularly those working on prototypes, tools, or backend analytics—this approach provides valuable access to advanced AI capabilities without the financial burden of full-speed services.

 
									 
					