Recent advances in artificial intelligence models capable of complex reasoning—such as solving mathematical problems or writing code—may be approaching a period of slower progress, according to a new analysis from the nonprofit research institute Epoch AI.
Highlights
The report examines the trajectory of reasoning models and suggests that while performance gains have been significant, the pace of improvement could begin to plateau within the next year.
Reasoning models distinguish themselves from traditional AI systems by performing multi-step logic tasks rather than just processing and predicting based on data.
OpenAI’s “o3” model, for example, has demonstrated strong results on benchmarks focused on reasoning capabilities, outperforming earlier iterations.
Much of this improvement is attributed to the use of reinforcement learning, a method that refines model outputs through trial-and-error feedback after initial training.
Reinforcement Learning: A New Bottleneck?
Until recently, reinforcement learning (RL) has been applied using relatively modest computational resources. That trend is shifting.
OpenAI has indicated that it used approximately ten times more compute power to train o3 compared to o1, with much of the increase likely allocated to reinforcement learning.
Dan Roberts, a researcher at OpenAI, confirmed that the company plans to further scale RL in future models, potentially devoting more resources to that stage than to initial training.
Epoch’s report highlights concerns about the sustainability of this approach. While reinforcement learning has been driving rapid gains—estimated at 10x performance improvements every 3 to 5 months—such acceleration may not be sustainable.
By contrast, traditional training typically yields performance improvements that scale by a factor of four annually. Epoch’s analysis predicts that, by 2026, the performance growth of reasoning models may align with the broader category of AI systems, narrowing their current advantage.
Resource Constraints and Diminishing Returns
The analysis also points to the high costs associated with reinforcement learning as a possible constraint on future progress.
These models require significant computational resources and extensive human oversight for tuning and experimentation. This makes them more expensive to develop and operate compared to conventional models.
Even with increased compute investment, future models may not yield proportional improvements. As compute costs rise and returns begin to diminish, AI developers may encounter limits in their ability to scale reasoning models using current methods.
Viability and Adoption
Beyond scaling, practical limitations may also hinder the deployment of reasoning models. Despite their advanced capabilities, these systems can still produce inaccurate outputs—commonly referred to as “hallucinations”—potentially more often than some traditional AI models.
This issue, combined with high training and inference costs, may limit real-world adoption, particularly in enterprise and safety-critical environments where reliability is paramount.
Potential Industry Impact
The anticipated slowdown could have broader implications for the AI sector. Over the past year, reasoning-focused models have emerged as a major area of investment, with applications in software development, scientific research, and diagnostics.
If scaling reinforcement learning becomes less viable, AI companies may need to reconsider their current roadmaps and explore alternative architectures or hybrid approaches that offer better efficiency.
The Epoch report suggests that while reinforcement learning has been instrumental in pushing the boundaries of model reasoning, it may no longer deliver exponential performance boosts without breakthroughs in methodology or infrastructure.
This could mark a shift in focus from sheer compute scaling to more algorithmic innovation.
While Epoch AI’s conclusions are partly based on projections and selective disclosures from AI companies, the report provides a rare quantitative assessment of a key area in AI development.
If reinforcement learning continues to face economic and technical constraints, the industry may soon reassess the limits of today’s model architectures—and look toward new strategies for progress beyond scaling alone.