Connect with us

Technology

Chatbots Crash the Math Olympiad and Walk Away with Gold

Math Olympiad
  • The AI models crafted by Google DeepMind and OpenAI achieved gold-level performance on the 2025 IMO problems.
  • The “Deep Think” model offered by Gemini was officially evaluated by IMO coordinators, while the model offered by OpenAI was graded by three former IMO medallists.

In July 2025, it was announced that AI models by Google DeepMind and OpenAI had received gold medal-level scores on actual problems of the International Mathematical Olympiad (IMO). These were not setups or simpler versions but actual problems used in an Olympiad contest.

Google’s model, an advanced version of Gemini running in “Deep Think” mode, was evaluated by IMO officials and scored 35 out of 42 points. OpenAI’s model, unnamed but confirmed as an experimental research model, was independently reviewed by three former IMO gold medallists and achieved the same score: 35.

What the Olympiad Is and Why It Matters

IMO stands for International Mathematical Olympiad. The highest math competition worldwide for secondary school students is held annually. It brings together national teams from more than 100 countries. Each competitor solves six problems in two sessions of 4.5 hours each over two days. These problems are highly challenging, typically requiring multiple steps of logical reasoning and advanced mathematical insight.

There were 67 gold medallists out of the 630 contestants in 2025. The cutoff to achieve a gold medal stood at 35 points, which translated into five problems solved out of six problems, perfectly by the authors.

Google’s Path to Certification

Google’s Deep Think model, part of the Gemini family, submitted solutions anonymously for official IMO evaluation. The judges were unaware they were grading a machine’s work. This ensured a fair and unbiased assessment.

Each solution was written in natural mathematical language and assessed under the same scoring rubric applied to human contestants. Google only announced the results after they were officially validated.

This follows an earlier AI effort by Google—AlphaGeometry—which achieved silver medal performance in 2024 but was specialised in geometry. The 2025 achievement came from a general-purpose model capable of reasoning across multiple math domains.

OpenAI’s Announcement and Method

OpenAI’s research team used a similar approach but with independent grading. Their model’s results were reviewed by three former gold medallists who confirmed the answers were valid and would earn full credit under IMO rules. Like Google’s model, it scored 35 out of 42.

Unlike Google, OpenAI released its results before the closing ceremony of the IMO. This drew some criticism from members of the education and Olympiad communities, who felt the release ruined the student contestants’ glory. 

OpenAI later clarified by saying their model had not taken part in the competition at all, with the release meant to demonstrate advancements in mathematical reasoning.

Breaking Down the Score

  • Number of problems: 6
  • Max points per problem: 7
  • Total score possible: 42
  • Time limit: Two 4.5-hour sessions over two days
  • Gold medal threshold (2025): 35 points
  • AI scores (Google and OpenAI): 35/42

What the AI Did

The models did not simply calculate numerical answers. The system generated a natural language mathematics proof just like human contestants would do. These proofs had to be convincing and logically sound, and justify each step taken in the assertion.

This contrasts with former AI math tools that rely on symbolic computation or structured solvers. The models demonstrated a capacity for general-purpose reasoning and explanatory argumentation.

Why This is a Benchmark, Not Just a PR Win

The IMO problems are a well-established academic benchmark. Unlike standardised tests or synthetic evaluations, they demand originality, persistence, and logical structure.

The decision for Google to submit papers anonymously until after the official closing ceremony showed much respect for the academic community. Meanwhile, the fact that OpenAI disclosed their evaluators and results, despite some controversies about timing, also added to the credibility of their claims.

What Makes This Different From Past AI Feats?

In this area, AI systems have been used for a long time; however, Olympiad-level problems require some higher reasoning. The problems are generally quite open-ended, hefty for decision by brute force, and mostly a matter of interpretation.

The ability of these AI models to solve five out of six of these problems and explain their reasoning marks a shift from language generation to structured logical thinking. It is not just about the correct answer—it’s about convincing, human-readable proof.

Important Limits and Clarifications

The AI systems that achieved these scores are not publicly available. OpenAI’s model was described as not ready for release for “many months”, while Google’s Deep Think system was a private research configuration of Gemini.

Public versions of GPT-4 and Gemini score significantly lower—reportedly around 10 to 13 points out of 42—far below the gold threshold.

Since both models failed with one of six Olympiad problems, though their results are impressive, these results are nevertheless not perfect and should not be regarded as surpassing human capability across all dimensions.

The Public Response

The response to the announcements was mixed. An array of researchers and technologists lauded this achievement, emphasising its implications for tutoring, formal reasoning, and other academic applications in the future.

The backlash, meanwhile, centred around the release of the scholarly paper on OpenAI’s results before official IMO results were ever made public. Some educators and Olympiad officials worried that the stellar achievements of student participants would be overshadowed by giant AI milestones.

On the opposite side was Google, which was thought well of for waiting until after the awards ceremony, an act that better fit well with community values.

What’s Next for AI in Math?

These results are a step forward, not an endpoint. AI’s ability to tackle Olympiad-level problems suggests future roles in research, proof generation, and automated theorem solving.

However, widespread deployment in education or academic settings will depend on transparency, reliability, and accessibility. AI still needs to demonstrate consistent reasoning and provide traceable explanations before it can be trusted in high-stakes environments.

A Final Thought

For over 60 years, the International Mathematical Olympiad has been a proving ground for the world’s top young problem-solvers. Now, for the first time, artificial intelligence has reached that bar—at least on paper.

Google DeepMind’s and OpenAI’s accomplishments are not about competition with students. They are about measuring the state of machine reasoning against one of the most rigorous tests available. The fact that both passed that test with gold-level results is a clear sign of how far AI has come.

What happens next will depend not just on what these models can do but on how and where we choose to use them.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Text Translator

Awards Ceremony

Click on the Image to view the Magazine

GBM Magazine cover


Global Brands Magazine is a leading brands magazine providing opinions and news related to various brands across the world. The company is head quartered in the United Kingdom. A fully autonomous branding magazine, Global Brands Magazine represents an astute source of information from across industries. The magazine provides the reader with up- to date news, reviews, opinions and polls on leading brands across the globe.


Copyright - Global Brands Publications Limited © 2025. Global Brands Publications is not responsible for the content of external sites.

Translate »