news

Google AI wins IMO silver medal, just one point away from gold! It only took 19 seconds to solve the fourth question

2024-07-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

The white and west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI

Just now, the big model captured another city!

Google DeepMind announced that their mathematical AI "won" the silver medal in the IMO (International Mathematical Olympiad) and was only one point away from the gold medal!

Yes, you heard it right! It is the math Olympiad problem that is difficult for most people. You should know that among all 609 participants in this year's IMO, only 58 reached the gold medal level.



This time, Google AI solved 4 out of 6 questions in the 2024 IMO competition, andGet full marks for each question, a total of 28 points. (Full score 42 points, gold medal score 29 points)



The fourth geometry question took AI only 19 seconds?!

The sixth question, which was said to be the most difficult of this year's competition, was answered completely correctly by only five contestants this year.



The results were also professionally certified by the IMO Organizing Committee - they were scored by Professor Timothy Gowers, IMO gold medalist and Fields Medal winner, and Dr. Joseph Myers, two-time IMO gold medalist and Chairman of the 2024 IMO Question Selection Committee.

Professor Timothy Gowers exclaimed:Far beyond the most advanced level I know

How did Lai Kangkang do it?

Google wins IMO silver medal, new member of Alpha family is released

The silver medals of IMO this time were won by two members of Google's Alpha family, each of whom has specialties in their respective fields.

  • AlphaProof, a new member of the Alpha family, a formal mathematical reasoning system based on reinforcement learning.
  • AlphaGeometry 2, an improved version of AlphaGeometry, specifically used to solve geometric problems.

Let’s first meet the new member - AlphaProof.

It is a self-training system that can prove mathematical statements using the formal language Lean. It combines a pre-trained language model with the AlphaZero reinforcement learning algorithm.

By fine-tuning Gemini, the team was able to automatically convert natural language statements into the formal language Lean statements, thereby creating a large library of mathematical questions.

When presented with a problem, AlphaProof generates solution candidates and then proves or disproves these candidates by searching for possible proof steps in Lean.

Each proof found and verified is used to strengthen AlphaProof's language model, improving its ability to solve subsequent, more challenging problems.

In the weeks before the competition, it was trained over and over again using millions of IMO-level questions.

A training cycle was also applied during the competition, which continuously strengthened its own proof until a complete solution was found.



Let's learn more about the evolutionAlphaGeometry 2. It is a neural-symbolic hybrid system where the language model is based on Gemini.

Its predecessor 1.0 was also published in Nature this year:No human demonstration is needed to reach the geometry level of IMO gold medalists



Compared with the previous version, it uses an order of magnitude larger amount of synthetic data for training from scratch. And the symbolic engine it uses is two orders of magnitude faster than its predecessor. When encountering new problems, a new knowledge sharing mechanism is used to achieve advanced combinations of different search trees to solve more complex problems.

Before the official competition, it could already solve 83% of all IMO geometry problems from the past 25 years, while its predecessor could only solve 53%.

In this year's IMO competition, it completed the fourth question in just 19 seconds.



Next, let’s take a look at how these two will work together in this IMO.

First, the problem is manually translated into formal mathematical language so that the system can understand it.

We know that when humans compete, they submit their answers in two batches, each lasting 4.5 hours.

Google's two systems solved one problem in a few minutes, while the other problems took three days.

Ultimately, AlphaProof solved two algebra problems and one number theory problem by determining the answers and proving their correctness.

This includes the most difficult question in the competition, which is the sixth question that only five contestants solved in this year's IMO competition.



AlphaGeometry 2 solves the geometry problem, but the two combinatorial problems remain unsolved.

In addition, the Google team also experimented with a natural language reasoning system based on Gemini. In other words, there is no need to translate questions into formal language, and it can be used in conjunction with other AI systems.

The team said they will next explore more AI methods to advance mathematical reasoning.

More technical details about AlphaProof are also planned to be released soon.

Netizen: I don’t understand math but I was shocked

After seeing the performance of these two systems, netizens said, "I don't understand math but I am greatly shocked."

Scott Wu, co-founder of Cognition AI, the AI ​​programmer Devin team, said:

Such results are truly amazing. As a kid, Olympiad competitions were everything to me. Never thought they would be solved by AI 10 years later.



OpenAI scientist Noam Brown also offered his congratulations:



However, some netizens also said that if the standard competition time is followed (the competition is held over two days, four and a half hours a day, and three problems are solved each day), these two AI systems can actually only solve one of the six problems.



This statement was immediately refuted by some netizens:

In this context, speed is not the primary concern. If the number of floating point operations (flops) remains constant, increasing the computing resources will reduce the time required to solve the problem.



Regarding this point, some netizens also questioned:

The two AI systems failed to solve the combination problem. Was it a training problem, insufficient computing resources, or insufficient time? Or were there other limitations?



Professor Timothy Gowers tweeted his opinion:

If human contestants were allowed to spend more time on each problem, their scores would undoubtedly be higher. However, for AI systems, this is already far beyond the capabilities of previous automatic theorem provers; secondly, as efficiency improves, the time required is expected to be further reduced.



However, a few days ago, the big model was still stuck on such an elementary school question, "Which number is bigger, 9.11 or 9.9?" How come the big model can solve the difficult problem of Olympiad level? !

He lost his mind, and now how come he has suddenly regained his sanity?



Nvidia scientist Jim Fan explains:Training data distributionThe problem.

The Google system is trained on formal proofs and domain-specific symbolic engines, so in a way they are highly specialized for solving Olympiad problems even though they are based on general-purpose large models.



And GPT-4o's training set contains a lot of GitHub code data, which may far exceed mathematical data. In the software version, "v9.11>v9.9", this may seriously distort the distribution. So, this error is reasonable.

He described this strange phenomenon as

We found a very strange region, like an exoplanet that looks like Earth but is full of uncanny valleys.

Some enthusiastic netizens also cue OpenAI, maybe you can also try it...

In response, Altman said:



Reference Links:
[1]https://x.com/googledeepmind/status/1816498082860667086?s=46
[2]https://x.com/jeffdean/status/1816498336171753948?s=46
[3]https://x.com/quocleix/status/1816501362328494500?s=46
[4]https://x.com/drjimfan/status/1816521330298356181?s=46
[5]https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/