LIM Center, Aleje Jerozolimskie 65/79, 00-697 Warsaw, Poland
+48 (22) 364 58 00

AlphaEvolve: DeepMind’s Gemini-Powered AI That Invents Algorithms and Breaks a 56-Year Record

AlphaEvolve: DeepMind’s Gemini-Powered AI That Invents Algorithms and Breaks a 56-Year Record

AlphaEvolve: DeepMind’s Gemini-Powered AI That Invents Algorithms and Breaks a 56-Year Record

Introduction

Google DeepMind’s AlphaEvolve is an experimental AI coding agent that can autonomously design and improve algorithms – a breakthrough that pushes the boundaries of what AI can create. Announced in mid-2025, AlphaEvolve uses Google’s cutting-edge Gemini large language models (LLMs) combined with an evolutionary search process and automated testing to iteratively refine computer code arxiv.org. Unlike a conventional chatbot, this agent doesn’t just generate answers – it writes, evaluates, and evolves programs to solve hard problems, ranging from abstract math puzzles to optimizing Google’s own computing systems tech.slashdot.org arxiv.org. The results have been remarkable: AlphaEvolve has invented new algorithms that beat decades-old records, and delivered real-world efficiency gains in data centers, hardware design, and AI model training. Researchers are hailing it as a significant step toward AI-driven scientific discovery tech.slashdot.org.

Key Achievements: AlphaEvolve’s early accomplishments include:

  • Shattering a 56-Year-Old Math Record: Discovered a faster method for 4×4 matrix multiplication – using 48 operations instead of the 49 from Strassen’s famous 1969 algorithm arxiv.org.
  • Optimizing Google’s Infrastructure: Devised a new scheduling algorithm for Google’s data centers (recovering ~0.7% extra compute capacity globally), and even improved the design of Google’s next-generation AI chips.
  • Speeding Up AI Training: Found code optimizations that accelerated a crucial training operation by 23%, cutting large-model training time by ~1%, and boosted a key AI inference routine (FlashAttention) by 32%.
  • Advancing Open Problems in Math: Tackled 50 unsolved mathematical problems – rediscovering known best solutions in ~75% of cases and improving the best known solutions ~20% of the time. Notably, it set a new milestone in the 11-dimensional kissing number problem, raising the known limit from 592 to 593.

AlphaEvolve’s versatility across such diverse challenges is turning heads. “It’s very surprising that you can do so many different things with a single system,” says Alexander Novikov, a senior research scientist on the project. In the sections below, we delve into how AlphaEvolve works, the landmark results it achieved, and what this means for the future of algorithmic discovery.

How AlphaEvolve Works: LLMs + Evolutionary Code Refinement

At its core, AlphaEvolve is an AI-driven software engineer that continually improves its own code solutions through a cycle of generation and feedback. It orchestrates a team of AI models and evaluators in an autonomous pipeline arxiv.org. The process works roughly as follows:

  • Problem Definition: A human provides an initial working solution (code) for a given problem, along with an evaluation function or tests that objectively measure solution quality (e.g. correctness, speed) tech.slashdot.org. This defines the task and how to judge progress.
  • Solution Generation: AlphaEvolve employs multiple LLMs (specifically Google’s Gemini models) to propose new candidate programs that tackle the problem tech.slashdot.org. It uses a fast, creative model (Gemini Flash) to generate a broad range of ideas, and a more powerful, detail-focused model (Gemini Pro) to suggest deeper improvements. Each candidate is essentially a modified version of the code (a “mutation”) aimed at improving some aspect of the algorithm.
  • Automated Evaluation: For each program generated, AlphaEvolve automatically runs tests or metrics to evaluate its performance deepmind.google tech.slashdot.org. The evaluators might check for correctness on benchmark inputs, measure execution speed, resource usage, or other domain-specific metrics. This step provides objective feedback – crucially reducing errors or AI “hallucinations,” since only factual, tested results are kept tech.slashdot.org.
  • Evolutionary Refinement: Inspired by evolution, the system retains the most promising programs and uses them to inform the next round of generation tech.slashdot.org. AlphaEvolve iteratively “breeds” better solutions by having the LLMs modify and recombine successful code snippets. Over many cycles, the programs gradually improve according to the evaluator’s metrics. This loop continues until no further gains are found or a time budget is reached.

Through this approach, AlphaEvolve moves beyond one-off code generation – it evolves entire codebases (hundreds of lines of code, not just a single function) and can optimize multiple objectives simultaneously. In essence, it leverages the creativity of large models but grounds them with rigorous testing at each step. Google DeepMind notes that this method allows the AI to tackle both “problems where discovering new algorithms is the goal, as well as problems where an algorithm is just the tool to solve something else”, as long as there’s a way to verify solutions automatically arxiv.org. By iteratively learning from its own trial-and-error, AlphaEvolve avoids false answers and hones in on high-quality solutions that even human experts hadn’t conceived.

Breaking a 56-Year-Old Barrier in Matrix Multiplication

One headline achievement of AlphaEvolve is its breakthrough in a classic computer science challenge: matrix multiplication. In 1969, Volker Strassen astonished the math world by discovering an algorithm to multiply 2×2 matrices using 7 multiplications instead of 8, kicking off a quest to multiply matrices faster than the textbook $O(n^3)$ method. However, even after decades of research, certain small-case improvements had eluded discovery. Notably, for 4×4 matrices, the best algorithm known for complex-number multiplication still required 49 scalar multiplications – a record held since Strassen’s era deepmind.google.

AlphaEvolve finally surpassed that 56-year-old record. Using its evolutionary code search, the AI found a new algorithm to multiply two 4×4 matrices with only 48 multiplications (in the complex domain) deepmind.google. This is the first time anyone has beaten Strassen’s 1969 result for 4×4 matrix multiplication over complex numbers, representing a notable theoretical advance. In the words of DeepMind’s researchers, it’s “the first improvement, after 56 years, over Strassen’s algorithm in this setting.” arxiv.org

While reducing one multiplication might seem like a minor tweak, in algorithmic terms it’s a big deal. Matrix multiplication is a fundamental operation underpinning many computations (including machine learning), so even slight efficiency gains can have transformative implications at scale. Moreover, this discovery validates the ability of AI to contribute original insights in pure mathematics. “Every single such case is a new discovery,” notes DeepMind scientist Matej Balog, speaking about AlphaEvolve’s algorithmic finds. It also builds on DeepMind’s prior work AlphaTensor (2022), which had specialized in uncovering fast matrix multiply algorithms. AlphaEvolve went further by discovering a more general solution – effectively a different way to perform 4×4 multiplication that even AlphaTensor hadn’t found.

(Technical note: AlphaEvolve’s 4×4 method leverages complex arithmetic cleverly, and its real-number efficiency is a subject of ongoing analysis. But as a proof of concept, it demonstrates AI’s power in exploring the huge search space of possible algorithms where human intuition alone hit a wall.)*

Boosting Data Center Efficiency by Evolving Better Schedules

Beyond math puzzles, AlphaEvolve has shown its prowess on practical engineering problems. One major success was in optimizing Google’s data center scheduling. Google’s massive data centers run on a cluster manager called Borg, which allocates tasks to servers. Even a tiny improvement in how resources are scheduled can translate into huge savings given Google’s scale. AlphaEvolve was tasked with finding a better scheduling heuristic – and it delivered. It discovered a simple but extremely effective scheduling algorithm that “continuously recovers, on average, 0.7% of Google’s worldwide compute resources” in production. In other words, by packing jobs onto machines more efficiently, it freed up nearly 1% of computing capacity across Google’s entire fleet, allowing more tasks to run on the same hardware.

This 0.7% gain may sound small, but for a hyperscaler like Google it is hugely significant – it’s as if hundreds of thousands of extra servers suddenly became available with zero new investment. Importantly, AlphaEvolve’s solution is a relatively straightforward piece of code that engineers can understand and deploy (unlike some opaque machine learning models). DeepMind notes it has “human-readable code” with advantages in interpretability and ease of maintenance. The new scheduling heuristic has been in production for over a year, proving its robustness. According to one report, it even outperformed an earlier approach discovered via deep reinforcement learning, setting a new state-of-the-art for Google’s Borg optimizer. This real-world impact underscores how an AI-generated algorithm can squeeze out efficiencies that expert-tuned systems hadn’t achieved – a tangible win for AI and Google’s bottom line.

Smarter Chip Design with AI-Generated Hardware Optimizations

AlphaEvolve’s ingenuity isn’t limited to software – it has also contributed to hardware design. In one example, the agent was tasked with improving a component of Google’s TPU (Tensor Processing Unit), the custom accelerator chips used for AI. The challenge involved a piece of highly optimized Verilog code (the programming language for hardware circuits) implementing an arithmetic circuit for matrix multiplication. Even in such a low-level domain, AlphaEvolve managed to find an improvement. It proposed a rewrite in Verilog that removed some unnecessary bits in the circuit logic, simplifying the design while preserving full functionality.

Crucially, any change to a chip design must undergo rigorous verification. AlphaEvolve’s suggestion passed all the verification tests to confirm it was functionally equivalent to the original. This optimized circuit is being integrated into an upcoming TPU model. In effect, an AI agent, working in tandem with human engineers, helped refine the silicon blueprint of future AI processors.

This achievement is striking for a few reasons. First, it shows AlphaEvolve can operate in the language of hardware engineers (Verilog), not just high-level pseudocode – demonstrating versatility across programming domains. Second, the fact that it could improve an already “highly optimized” circuit suggests AI might spot non-intuitive tweaks that human chip designers missed. Google DeepMind views this as a collaborative AI–human approach to chip engineering, where the AI suggests changes in familiar terms and humans verify and implement them. As AI-designed optimizations get folded into real hardware, we could see faster or more efficient chips that literally have evolved code inside them.

Speeding Up AI Training and Inference

Another area where AlphaEvolve has paid dividends is in accelerating Google’s own AI model development. Training giant neural networks is extremely resource-intensive, so even minor speed-ups can save time and cost. AlphaEvolve was applied to optimize critical code kernels in the training pipeline of Gemini (the very LLM that powers AlphaEvolve). One success story was finding a smarter way to perform a large matrix multiplication within Gemini’s architecture. By reorganizing the computation (essentially finding an optimal tiling/decomposition of the matrix ops), AlphaEvolve achieved a 23% speed-up for that kernel. This optimization cascaded into roughly a 1% reduction in Gemini’s overall training time. A 1% training efficiency gain is significant at the scale of large models – it means days less training time and corresponding savings in compute power.

AlphaEvolve also tackled improvements in inference (AI model runtime). For instance, it was challenged to optimize the implementation of FlashAttention, a fast algorithm for the attention mechanism in Transformers. The code for FlashAttention was already auto-generated and highly tuned, making this a difficult test. Impressively, AlphaEvolve managed to accelerate the core FlashAttention kernel by 32%, and further streamlined the surrounding pre- and post-processing code for an additional ~15% speed gain. These optimizations were verified for correctness and demonstrate that even compiler-produced code can be improved by an AI agent’s creative search.

In practical terms, such enhancements mean that AI services can run faster and more efficiently. For Google, a 1% training speedup or a 0.7% resource gain translates to millions of dollars saved and faster deployment of new models. Beyond raw performance, there’s also a productivity benefit: AlphaEvolve cut down the engineering time needed to optimize low-level kernels from “weeks of expert effort to days” in these cases. This hints at a future where AIs handle much of the heavy lifting in code optimization, allowing human developers to focus on higher-level design.

Tackling Open Problems in Mathematics and Algorithm Design

Perhaps the most intriguing demonstration of AlphaEvolve’s capability is its performance on open mathematical problems. The DeepMind team selected over 50 unsolved or long-standing problems across various fields – analysis, geometry, combinatorics, number theory – and set AlphaEvolve loose on them. These were not simple textbook exercises, but questions where the best solutions were unknown or unproven. Remarkably, AlphaEvolve managed to make headway on many of them.

According to the researchers, about 75% of the time the AI rediscovered the known state-of-the-art solution (essentially matching the best human result independently). Even more exciting, in roughly 20% of the problems AlphaEvolve actually improved the best-known solution, establishing new records or bounds. (In the remaining cases, ~5%, it converged to suboptimal answers, highlighting that it’s not infallible.) Each improvement is a genuine new discovery that expands human knowledge. “Every single such case is a new discovery,” emphasizes Matej Balog, who worked on the project.

One vivid example is the kissing number problem in 11 dimensions – a famous puzzle in geometry that has intrigued mathematicians for over 300 years. (The kissing number is the maximum count of equal-sized spheres that can all touch a given sphere without overlapping.) For most higher dimensions, only bounds are known. Before, the best lower bound in 11-D was 592 spheres. AlphaEvolve found a packing arrangement of 593 spheres, nudging that lower bound up by one. That incremental improvement might sound small, but it’s a noteworthy advance in a problem so difficult that progress had stalled for decades. It demonstrates that AlphaEvolve can contribute original insights even in pure math research, lending evidence against the notion that large AI models “are not capable of original scientific contributions.”

Beyond the kissing number, AlphaEvolve also helped devise a novel gradient-based optimization procedure that discovered multiple new matrix multiplication algorithms. It made strides in combinatorics problems and other challenges, often with only a few hours of setup per problem by the team. Importantly, these successes were achieved by the same general system, not a dozen specialized tools – highlighting the breadth of AlphaEvolve’s approach. By encoding problems as code and using a rigorous evaluate-and-evolve loop, the agent proved flexible enough to work on tasks ranging from pure math theory to industrial engineering.

Implications and Future Outlook

AlphaEvolve’s emergence hints at a new paradigm for algorithmic research. Instead of humans hand-crafting algorithms from intuition and trial-and-error, we can now ask an AI agent to systematically explore possibilities and evolve better solutions. This has potential far beyond the examples seen so far. DeepMind’s team believes AlphaEvolve’s generality means it “can be applied to any problem whose solution can be described as an algorithm, and automatically verified.” In the future, similar AI coders could tackle challenges in materials science (e.g. discovering new experimental protocols), drug discovery (optimizing synthetic routes), economics (finding better allocation algorithms), or any domain where you can write an evaluator to test a candidate solution.

It’s worth noting that AlphaEvolve is a research prototype and not yet available to the public. “The AI remains too complex for public release,” one report noted, though an early-access program for select academic users is being planned tech.slashdot.org. As the underlying LLMs improve (Google’s Gemini models are still evolving), AlphaEvolve is expected to become even more powerful and efficient at coding. The hope is to eventually integrate such agents into tools that many researchers and engineers can use. Google is already exploring ways to incorporate AlphaEvolve into its internal workflows – for example, possibly embedding it into compilers or software development suites to automatically suggest optimizations.

The success of AlphaEvolve also raises deeper questions. It challenges the assumption that AI models merely regurgitate training data; here we have an AI producing new knowledge – from math proofs to novel code – that wasn’t in any dataset. This blurs the line between human creativity and machine computation. There will be discussions around trust and verification (AlphaEvolve’s solutions are automatically verified for given tests, but formal proof of optimality is another matter in math problems). However, the collaborative model – AI proposing and humans validating – appears to be a fruitful one. As one observer put it, DeepMind’s latest agent “marks a significant step toward using this technology to tackle big problems in math and science.” tech.slashdot.org

In sum, AlphaEvolve represents a leap towards self-improving AI that can design better algorithms than we can. It has already outperformed human-crafted solutions in several domains – from squeezing more efficiency out of Google’s servers to discovering record-breaking mathematical algorithms. And it achieved all this with a single, unified system, powered by large language models guided by evolutionary feedback. If these early results are any indication, AI-driven algorithm discovery may become a powerful new tool in the scientist’s toolbox. We are witnessing the dawn of AI not just as a problem-solver, but as an inventor of solutions, potentially accelerating progress across computer science and beyond tech.slashdot.org.

Sources: Recent DeepMind white paper and blog announcement arxiv.org; reporting by IEEE Spectrum; Ars Technica/Slashdot summary tech.slashdot.org; and other expert analyses.

Tags: , ,