OpenAI and Google DeepMind Achieve Gold-Level in IMO Performance

In a notable milestone for artificial intelligence, both OpenAI and Google DeepMind have reported that their latest models reached gold-medal-level performance in the 2025 International Math Olympiad (IMO)—the prestigious mathematics competition traditionally dominated by the world’s top-performing high school students.

Highlights

Breakthrough in AI Reasoning: OpenAI and Google DeepMind report solving 5 out of 6 problems from the 2025 International Math Olympiad (IMO), matching gold medalist performance.
Informal, Natural Language Reasoning: Unlike past symbolic methods, both models reasoned in plain English—signaling a leap in AI’s ability to handle abstract, creative problems.
Two Philosophies, One Goal: OpenAI used pure LLM chains for inference, while DeepMind used a hybrid symbolic-LLM method—showcasing different strategies to achieve similar outcomes.
Evaluation Controversy: OpenAI’s evaluation relied on independent reviewers and preceded official IMO grading; DeepMind followed the formal review process, sparking debate about procedure and credibility.
What Gold Means: Only ~10% of the 630+ human contestants earned gold medals. These AI models matched the top-tier performance in one of the world’s most rigorous academic competitions.
No Public Release Yet: Neither OpenAI nor DeepMind have released the models, citing experimental status. OpenAI suggests public access could take “many months.”
Strategy Over Speed: DeepMind’s emphasis on rigor and transparency reframes the AI race not just as a competition in capability, but also in scientific trust and governance.

Both companies announced that their systems correctly solved five out of six IMO problems, surpassing the performance of most student participants.

Unlike previous efforts that relied on formal symbolic systems requiring human pre-processing, this year’s models tackled the problems using “informal” reasoning. These models interpreted natural language questions directly and generated proof-based answers in plain English.

Informal Reasoning

This shift to informal systems marks a turning point in AI’s reasoning capabilities. Machines have historically struggled with the kind of ambiguity, multi-step logic, and creative thinking required in math olympiads.

Researchers from both companies describe this achievement as a substantial advance in general reasoning, particularly in solving non-verifiable problems that extend beyond traditional math exercises or programming tasks.

Google DeepMind used a hybrid approach that combined formal symbolic logic with large language model reasoning, while OpenAI’s approach relied entirely on LLM-generated reasoning—referred to internally as “pure LLM chains.”

Both methods underscore evolving philosophies in AI model design and hint at future directions for advanced problem-solving AI.

Differing Approaches and Disputes Over Recognition

While both results are technically impressive, their release sparked debate—not about performance, but about procedure.

OpenAI publicized its achievement shortly after the IMO student awards ceremony, but before undergoing any official grading process sanctioned by the IMO. Instead, the company enlisted three former IMO medalists to independently evaluate its model’s output.

Google DeepMind, in contrast, participated in the official IMO evaluation. The company collaborated directly with the competition’s organizers and waited until the formal grading was completed before sharing its results publicly.

Thang Luong, who leads DeepMind’s math reasoning research, emphasized the importance of adhering to the IMO’s established evaluation standards. According to Luong, “Any evaluation not based on that guideline cannot claim gold-level performance.”

OpenAI has since clarified that it did not initially enter the formal process but chose to contact IMO organizers only after reaching what it believed to be a gold-worthy performance.

While OpenAI states it waited until after the student awards to make its announcement, some within the AI research community expressed concern over the timing and process transparency.

What Gold Means in the IMO

Out of over 630 student participants this year, only around 10% received gold medals. That AI systems could match this level underscores a rapid acceleration in machine reasoning capabilities.

IMO problems demand creativity, deep abstraction, and long-form logical deduction—skills long thought to be uniquely human.

Unlike structured coding challenges or basic logic puzzles, these problems often require sustained reasoning across multiple conceptual domains, a hallmark of elite human cognition.

Performance and Methodology

Google DeepMind’s Model – The hybrid reasoning engine that blends formal symbolic logic with natural language outputs (likely based on Gemini Deep Think).
OpenAI’s Model – Pure LLM-based reasoning, without external symbolic formalization. All logic is derived from generative model chaining.

Both models executed their reasoning processes at test time using substantial computational resources.

OpenAI has not disclosed the compute cost, but its approach appears to have relied heavily on deep inference-time reasoning, pushing the bounds of what’s possible with large-scale models.

For AI Research and Education

This achievement is being viewed as a precursor to broader applications of AI in mathematics and science.

By demonstrating informal proof generation at a gold-medal level, both labs signal a future where AI could contribute meaningfully to open-ended scientific problems—not just replicate known patterns.

Neither company plans to release these exact models in the near future. OpenAI has suggested it will be “many months” before a public rollout, underscoring the models’ current experimental status.

What's Hot

Snapdragon 8 Elite 2 Leak Hints at 4 Million+ AnTuTu Score Ahead of Official Launch

Microsoft’s Next Annual Windows 11 (25H2) Update Enters Release Preview Testing

Meta Faces Challenges in $14.3B Collaboration With Scale AI

Microsoft’s Next Annual Windows 11 (25H2) Update Enters Release Preview Testing

Meta Faces Challenges in $14.3B Collaboration With Scale AI

China Launches ‘Darwin Monkey’, a Neuromorphic Supercomputer Modeled on the Brain

Microsoft Launches Copilot Shopping with Built-in Checkout and Price Tracking

Samsung Galaxy S25 Rumours of A New Face in 2025

CapCut Ends Free Cloud Storage, Introduces Paid Plans Starting August 5

Meta Faces Challenges in $14.3B Collaboration With Scale AI

Reliance Taps Google and Meta to Build India’s AI Backbone

xAI Launches Grok Code Fast 1, a Lightweight Agentic AI Model for Developers

Microsoft Unveils Its First Homegrown AI Models – MAI-Voice-1 & MAI-1-Preview

Anthropic Blocks Hacker Attempts to Misuse Claude AI for Cybercrime

Most Popular

Samsung Galaxy S25 Rumours of A New Face in 2025

Alleged iPhone 17 Pro Geekbench Scores Hint at Significant A19 Pro Chip Performance Leap

Insightful iQoo Z9 Turbo with New Changes in 2024

Our Picks

Google Tests AI-Powered Age Estimation to Shield Minors Across Its Products in the U.S.

Apple Previews Major Accessibility Upgrades, Explores Brain-Computer Interface Integration

Apple Advances Custom Chip Development for Smart Glasses, Macs, and AI Systems

Subscribe to Updates

What's Hot

OpenAI and Google DeepMind Achieve Gold-Level in IMO Performance

Highlights

Informal Reasoning

Differing Approaches and Disputes Over Recognition

What Gold Means in the IMO

Performance and Methodology

For AI Research and Education

Related Posts

Subscribe to Updates