February 10, 2025

3 min learn

Google’s AI Can Beat the Smartest High Schoolers in Math

Google’s AlphaGeometry2 AI reaches the level of gold-medal students within the Global Mathematical Olympiad

By Davide Castelvecchi & Nature magazine

Blue cube pyramid with blue sky background. — Google DeepMind’s AI AlphaGeometry2 aced considerations build at the Global Mathematical Olympiad.

Wirestock, Inc./Alamy Stock Photo

A 365 days ago AlphaGeometry, a synthetic-intelligence (AI) reveal solver created by Google DeepMind, surprised the enviornment by performing at the level of silver medallists within the Global Mathematical Olympiad (IMO), a prestigious competitors that sets complex maths considerations for gifted excessive-college students.

The DeepMind team now says the performance of its upgraded system, AlphaGeometry2, has surpassed the level of the moderate gold medallist. The results are described in a preprint on the arXiv.

“I imagine it obtained’t be lengthy earlier than computers are getting elephantine marks on the IMO,” says Kevin Buzzard, a mathematician at Imperial College London.

On supporting science journalism

While you’re taking part on this article, own in thoughts supporting our award-a hit journalism by subscribing. By buying a subscription you are helping to substantiate the vogue forward for impactful experiences relating to the discoveries and solutions shaping our world as we train time.

Solving considerations in Euclidean geometry is one in every of the four issues covered in IMO considerations — the others hide the branches of number principle, algebra and combinatorics. Geometry calls for explicit skills of an AI, due to competitors must present a rigorous proof for a assertion about geometric objects on the plane. In July, AlphaGeometry2 made its public debut alongside a newly unveiled system, AlphaProof, which DeepMind developed for solving the non-geometry questions within the IMO reveal sets.

Mathematical language

AlphaGeometry is a aggregate of formulation that include a in reality impartial precise language mannequin and a ‘neuro-symbolic’ system — one which doesn’t put collectively by studying from recordsdata devour a neural network but has abstract reasoning coded in by humans. The team skilled the language mannequin to discuss a proper mathematical language, which makes it imaginable to automatically take a look at its output for logical rigour — and to weed out the ‘hallucinations’, the incoherent or false statements that AI chatbots are inclined to making.

For AlphaGeometry2, the team made plenty of enhancements, in conjunction with the combination of Google’s reveal-of-the-artwork immense language mannequin, Gemini. The team also launched the flexibility to reason by shifting geometric objects throughout the plane — such as shifting a degree along a line to trade the tip of a triangle — and solving linear equations.

The system used to be ready to clear up 84% of all geometry considerations given in IMOs within the previous 25 years, in comparison with 54% for the principle AlphaGeometry. (Teams in India and China used assorted approaches very top 365 days to full gold-medal-level performance in geometry, but on a smaller subset of IMO geometry considerations.)

The authors of the DeepMind paper write that future enhancements of AlphaGeometry will include going through maths considerations that have confidence inequalities and non-linear equations, which will possible be required to to “fully clear up geometry.”

Rapid growth

The first AI system to full a gold-medal rating for the general test may per chance per chance per chance earn a US$5-million award known as the AI Mathematical Olympiad Prize — though that competitors requires systems to be originate-supply, which is now not the case for DeepMind.

Buzzard says he’s now not surprised by the hastily growth made each by DeepMind and by the Indian and Chinese teams. However, he adds, though the considerations are hard, the arena is gentle conceptually easy, and there are numerous more challenges to beat earlier than AI is ready to clear up considerations at the level of evaluation arithmetic.

AI researchers will possible be eagerly waiting for the next iteration of the IMO in Sunshine Wing, Australia, in July. As soon as its considerations are made public for human contributors to clear up, AI-primarily based fully systems score to clear up them, too. (AI agents are now not allowed to rob segment within the competitors, and are which capacity that truth now not eligible to earn medals.) Fresh considerations are viewed because the most legitimate test for machine-studying-primarily based fully systems, due to there may per chance be now not any likelihood that the considerations or their resolution existed on-line and must own ‘leaked’ into practising recordsdata sets, skewing the outcomes.

This article is reproduced with permission and used to be first published on February 7, 2025.

Read Extra

Google’s DeepMind AI Can Solve Math Problems on Par with Top Human Solvers

On supporting science journalism

Mathematical language

Rapid growth

Related Posts