BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
RESEARCHGoogle / Alphabet2026-02-26

DeepMind Researchers Use LLMs to Autonomously Discover New Multi-Agent Learning Algorithms

Key Takeaways

  • ▸DeepMind's AlphaEvolve uses LLMs to automatically discover new multi-agent learning algorithms, reducing reliance on manual human design
  • ▸Two novel algorithms emerged: VAD-CFR for regret minimization and SHOR-PSRO for population-based training, both outperforming existing state-of-the-art methods
  • ▸The discovered algorithms employ non-intuitive mechanisms that human researchers might not have considered, including volatility-sensitive discounting and dynamic meta-solver blending
Source:
Hacker Newshttps://arxiv.org/abs/2602.16928↗

Summary

Researchers from DeepMind have published groundbreaking work demonstrating how large language models can automatically discover novel multi-agent reinforcement learning (MARL) algorithms. The team, led by Zun Li along with John Schultz, Daniel Hennes, and Marc Lanctot, introduced AlphaEvolve, an evolutionary coding agent that navigates the complex design space of game-theoretic learning algorithms without human intervention.

The research addresses a longstanding challenge in MARL: while foundational approaches like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) have strong theoretical foundations, designing their most effective variants has traditionally required extensive manual experimentation and human intuition. AlphaEvolve autonomously evolved two novel algorithms that outperform state-of-the-art baselines in imperfect-information games.

The first discovery, Volatility-Adaptive Discounted CFR (VAD-CFR), introduces non-intuitive mechanisms including volatility-sensitive discounting and consistency-enforced optimism to improve upon existing regret minimization approaches. The second, Smoothed Hybrid Optimistic Regret PSRO (SHOR-PSRO), employs a hybrid meta-solver that dynamically transitions from encouraging population diversity to rigorous equilibrium finding. Both algorithms demonstrate superior empirical convergence compared to manually designed alternatives, suggesting that LLM-driven algorithm discovery could accelerate progress in complex AI research domains.

  • This approach could accelerate algorithmic innovation in game theory and reinforcement learning by automating the exploration of vast design spaces

Editorial Opinion

This research represents a fascinating meta-level application of AI: using large language models to discover better AI algorithms themselves. The non-intuitive nature of the discovered mechanisms—like volatility-adaptive discounting—suggests that LLMs may explore algorithmic design spaces differently than human researchers, potentially uncovering solutions that bypass human cognitive biases. If this approach generalizes beyond game-theoretic learning, we could be entering an era where AI systems routinely contribute to their own algorithmic evolution, dramatically accelerating the pace of AI research itself.

Reinforcement LearningMultimodal AIAI AgentsMachine LearningScience & Research

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Google / AlphabetGoogle / Alphabet
PARTNERSHIP

Singapore Inks AI Deals with Google

2026-05-20
Google / AlphabetGoogle / Alphabet
UPDATE

Google Overhauls Workspace App Icons with Gradient Design to Emphasize AI Integration

2026-05-20

Comments

Suggested

Research CommunityResearch Community
RESEARCH

New Methodology Proposed for Selecting Runtime Architecture Patterns in Production LLM Agents

2026-05-20
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us