news

Google AI wins the silver medal at IMO, AlphaProof is released, and reinforcement learning is so back

2024-07-26

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Synced Editorial Department

The Gemini large model and AlphaZero reinforcement learning algorithm are used, and geometry, algebra, and number theory are all mastered.

For AI, Mathematical Olympiad is no longer a problem.

On Thursday, Google DeepMind's artificial intelligence accomplished a feat: it used AI to solve real questions from this year's International Mathematical Olympiad (IMO), and was just one step away from winning the gold medal.



The IMO competition that just ended last week had a total of six questions, covering algebra, combinatorics, geometry and number theory. Google's hybrid AI system got four of them right, earning 28 points, reaching the silver medal level.

Earlier this month, UCLA tenured professor Terence Tao just promoted the AI ​​Mathematical Olympiad (AIMO Progress Award) with a million-dollar prize. Unexpectedly, before July was over, AI's problem-solving skills had improved to this level.

IMO did the questions simultaneously and got the most difficult questions right

IMO is the oldest, largest and most prestigious competition for young mathematicians, held annually since 1959. Recently, the IMO competition has also been widely recognized as a major challenge in the field of machine learning, becoming an ideal benchmark for measuring the high-level mathematical reasoning capabilities of artificial intelligence systems.

At this year's IMO competition, AlphaProof and AlphaGeometry 2, developed by the DeepMind team, jointly achieved a milestone breakthrough.

Among them, AlphaProof is a reinforcement learning system for formal mathematical reasoning, while AlphaGeometry 2 is an improved version of DeepMind's geometry solving system AlphaGeometry.

This breakthrough demonstrates that general artificial intelligence (AGI) with advanced mathematical reasoning capabilities has the potential to open up new areas of science and technology.

So, how did DeepMind's AI system participate in the IMO competition?

In simple terms, first these math problems were manually translated into formalized mathematical language so that the AI ​​system could understand them. In the official competition, human contestants submitted their answers in two sessions (two days), with a time limit of 4.5 hours per session. The AI ​​system composed of AlphaProof+AlphaGeometry 2 solved one problem in a few minutes, but took three days to solve other problems. Although if the rules were strictly followed, DeepMind's system timed out. Some people speculate that this may involve a lot of brute force cracking.



Google said AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving its correctness. These included the most difficult problem in the competition, which only five contestants solved at this year's IMO. And AlphaGeometry 2 proved a geometry problem.

Solutions given by AI: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/imo-2024-solutions/index.html

IMO Gold Medalist and Fields Medalist Timothy Gowers and two-time IMO Gold Medalist and Chair of the IMO 2024 Problem Selection Committee Dr. Joseph Myers scored the solution given by the combined system according to the IMO scoring rules.

Each of the six questions was scored out of 7 points, with a maximum total score of 42 points. DeepMind's system finally scored 28 points, meaning that all four problems it solved received full marks - equivalent to the highest score in the silver medal category. This year's gold medal threshold was 29 points, and 58 of the 609 contestants in the official competition won gold medals.



This graph shows how Google DeepMind's AI system performed relative to its human competitors at IMO 2024. The system scored 28 out of a total of 42 points, putting it on par with the competition's silver medalist. Also, this year 29 points would have been enough to win a gold medal.

AlphaProof: A Formal Reasoning Method

Among the hybrid AI systems used by Google, AlphaProof is a self-training system for proving mathematical statements in the formal language Lean. It combines a pre-trained language model with the AlphaZero reinforcement learning algorithm.

Among them, formal languages ​​offer an important advantage in formally verifying the correctness of mathematical proofs, which has previously been limited in use in machine learning because of the limited amount of manually written data.

In contrast, although natural language-based methods can access more orders of magnitude of data, they will produce intermediate reasoning steps and solutions that seem reasonable but are incorrect.

Google DeepMind has built a bridge between these two complementary fields by fine-tuning the Gemini model to automatically translate natural language problem statements into formal statements, creating a large library of formal problems of varying difficulty.

Given a math problem, AlphaProof generates candidate solutions and then proves them by searching for possible proof steps in Lean. Each proof found and verified is used to strengthen AlphaProof's language model, enhancing its ability to solve subsequent more challenging problems.

To train AlphaProof, Google DeepMind proved or disproved millions of mathematical problems covering a wide range of difficulty and topics in the weeks leading up to the IMO competition. A training loop was also applied during the competition to strengthen the proofs of self-generated competition problem variants until a complete solution was found.



AlphaProof reinforcement learning training loop process infographic: About one million informal mathematical problems are translated into formal mathematical language by the formal network. Then, the solver network searches for proofs or disproofs of the problems, and gradually trains itself to solve more challenging problems through the AlphaZero algorithm.

More competitive AlphaGeometry 2

AlphaGeometry 2 is a significantly improved version of the mathematical AI AlphaGeometry that appeared in Nature this year. It is a neural-symbolic hybrid system in which the language model is based on Gemini and trained from scratch on an order of magnitude more synthetic data than its predecessor. This helps the model solve more challenging geometry problems, including those about the movement of objects and equations of angles, scales, or distances.

AlphaGeometry 2 uses a symbolic engine that is two orders of magnitude faster than its predecessor. When new problems are encountered, a novel knowledge sharing mechanism enables advanced combinations of different search trees to solve more complex problems.

Prior to this year’s competition, AlphaGeometry 2 could solve 83% of all IMO geometry history problems from the past 25 years, while its predecessor could only solve 53%. At IMO 2024, AlphaGeometry 2 solved Problem 4 in 19 seconds after receiving its formalization.



Example of Problem 4, which requires proving that the sum of ∠KIL and ∠XPY equals 180°. AlphaGeometry 2 proposes constructing a point E on line BI such that ∠AEB = 90°. Point E helps give meaning to point L in line segment AB, thereby creating many pairs of similar triangles, such as ABE ~ YBI and ALE ~ IPC, to prove the conclusion.

Google DeepMind also reported that as part of the IMO work, researchers also experimented with a system based on Gemini and a state-of-the-art natural language reasoning system, hoping to achieve advanced problem-solving capabilities. The system does not require translation of questions into formal languages ​​and can be combined with other AI systems. In the test of this year's IMO competition questions, it "showed great potential."

Google is continuing to explore AI methods for advancing mathematical reasoning and plans to release more technical details about AlphaProof soon.

We can’t wait to see a future in which mathematicians use AI tools to explore hypotheses, try bold new approaches to long-standing problems, and quickly complete time-consuming elements of proofs — and that AI systems like Gemini will become even more powerful in mathematics and reasoning more broadly.

research team

Google said the new research was supported by the International Mathematical Olympiad Organization, and also:

Development of AlphaProof was led by Thomas Hubert, Rishi Mehta, and Laurent Sartran; key contributors included Hussain Masoom, Aja Huang, Miklós Z. Horváth, Tom Zahavy, Vivek Veeriah, Eric Wieser, Jessica Yung, Lei Yu, Yannick Schroecker, Julian Schrittwieser, Ottavia Bertolli, Borja Ibarz, Edward Lockhart, Edward Hughes, Mark Rowland, and Grace Margand.



Among them, Aja Huang, Julian Schrittwieser, Yannick Schroecker and other members were also the core members of the AlphaGo paper 8 years ago (2016). 8 years ago, their AlphaGo based on reinforcement learning became famous. 8 years later, reinforcement learning shines again in AlphaProof. Someone exclaimed in the circle of friends: RL is so back!



AlphaGeometry 2 and natural language inference work was led by Thang Luong. Development of AlphaGeometry 2 was led by Trieu Trinh and Yuri Chervonyi, with important contributions from Mirek Olšák, Xiaomeng Yang, Hoang Nguyen, Junehyuk Jung, Dawsen Hwang, and Marcelo Menegali.



In addition, David Silver, Quoc Le, Hassabis and Pushmeet Kohli are responsible for coordinating and managing the entire project.

References:

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/