
DeepMind hits milestone in solving maths problems — AI’s next grand challenge
原始链接: https://www.nature.com/articles/d41586-025-01523-z
DeepMind的AlphaEvolve是一个通用的AI系统,在解决数学和计算机科学中的复杂问题方面取得了进展。它结合了大型语言模型(LLM)的创造力和严格的算法评估,能够根据性能指标改进解决方案。 不同于专门的AI工具,AlphaEvolve利用LLM的广泛能力来生成跨不同领域的代码。DeepMind已经将其应用于改进其张量处理单元,从而使谷歌的计算资源使用量减少了0.7%。 AlphaEvolve建立在早期的FunSearch系统之上,能够处理更大、更复杂的算法。值得注意的是,它设计了一种比已建立的Strassen算法更快的矩阵乘法方法,甚至超越了DeepMind专门的AlphaTensor AI。专家们认为AlphaEvolve是一项重大进步,证明了通用LLM在做出新发现方面的潜力。
DeepMind says that AlphaEvolve has helped to improve the design of AI chips.Credit: Christian Ohde/IMAGO via Alamy
Google DeepMind has used chatbot models to come up with solutions to major problems in mathematics and computer science.
The system, called AlphaEvolve, combines the creativity of a large language model (LLM) with algorithms that can scrutinize the model’s suggestions to filter and improve solutions. It was described in a white paper released by the company on 14 May.
DeepMind hits milestone in solving maths problems — AI’s next grand challenge
“The paper is quite spectacular,” says Mario Krenn, who leads the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. “I think AlphaEvolve is the first successful demonstration of new discoveries based on general-purpose LLMs.”
As well as using the system to discover solutions to open maths problems, DeepMind has already applied the artificial intelligence (AI) technique to its own practical challenges, says Pushmeet Kohli, head of science at the firm in London.
AlphaEvolve has helped to improve the design of the company’s next generation of tensor processing units — computing chips developed specially for AI — and has found a way to more efficiently exploit Google’s worldwide computing capacity, saving 0.7% of total resources. “It has had substantial impact,” says Kohli.
Most of the successful applications of AI in science so far — including the protein-designing tool AlphaFold — have involved a learning algorithm that was hand-crafted for its task, says Krenn. But AlphaEvolve is general-purpose, tapping the abilities of LLMs to generate code to solve problems in a wide range of domains.
DeepMind describes AlphaEvolve as an ‘agent’, because it involves using interacting AI models. But it targets a different point in the scientific process from many other ‘agentic’ AI science systems, which have been used to review the literature and suggest hypotheses.
AlphaEvolve is based on the firm’s Gemini family of LLMs. Each task starts with the user inputting a question, criteria for evaluation and a suggested solution, for which the LLM proposes hundreds or thousands of modifications. An ‘evaluator’ algorithm then assesses the modifications against the metrics for a good solution (for example, in the task of assigning Google’s computing jobs, researchers want to waste fewer resources).
How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models
On the basis of which solutions are judged to be the best, the LLM suggests fresh ideas and over time the system evolves a population of stronger algorithms, says Matej Balog, an AI scientist at DeepMind who co-led the research. “We explore this diverse set of possibilities of how the problem can be solved,” he says.
AlphaEvolve builds on the firm’s FunSearch system, which in 2023 was shown to use a similar evolutionary approach to outdo humans in unsolved problems in maths1. Compared with FunSearch, AlphaEvolve can handle much larger pieces of code and tackle more complex algorithms across a wide range of scientific domains, says Balog.
DeepMind says that AlphaEvolve has come up with a way to perform a calculation, known as matrix multiplication, that in some cases is faster than the fastest-known method, which was developed by German mathematician Volker Strassen in 19692. Such calculations involve multiplying numbers in grids and are used to train neural networks. Despite being general-purpose, AlphaEvolve outperformed AlphaTensor, an AI tool described by the firm in 2022 and designed specifically for matrix mechanics3.