Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

Key Takeaways

▸Evo 2 is trained on 9 trillion DNA base pairs with a 1 million token context window, covering all domains of life at single-nucleotide resolution
▸The model predicts functional impacts of genetic variations, including pathogenic mutations and BRCA1 variants, without task-specific fine-tuning
▸Evo 2 generates genome-scale sequences for mitochondrial, prokaryotic, and eukaryotic organisms with experimentally validated results

Source:

Hacker Newshttps://www.nature.com/articles/s41586-026-10176-5↗

Summary

Researchers have unveiled Evo 2, a biological foundation model trained on 9 trillion DNA base pairs from a comprehensive genomic atlas spanning all domains of life. Published in Nature, the model features a 1 million token context window with single-nucleotide resolution, enabling unprecedented capabilities in predicting functional impacts of genetic variation and generating novel genomic sequences. The model demonstrates the ability to accurately predict effects of genetic changes—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without requiring task-specific fine-tuning.

Evo 2's mechanistic interpretability analyses reveal sophisticated biological understanding, with learned representations associated with exon-intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. The model's generative capabilities extend to producing mitochondrial, prokaryotic, and eukaryotic sequences at genome scale, showing greater naturalness and coherence than previous methods. When guided by predictive models and inference-time search, Evo 2 successfully generates experimentally validated chromatin accessibility patterns.

In a significant move for scientific collaboration, the research team has made Evo 2 fully open source, releasing model parameters, training code, inference code, and the OpenGenome2 dataset. This comprehensive release aims to accelerate exploration and design of biological complexity across the research community. The model represents a major advancement in applying artificial intelligence to genomics, potentially transforming how researchers understand and engineer biological systems across all forms of life.

Full open-source release includes model weights, training code, inference code, and the OpenGenome2 dataset
Mechanistic analysis shows the model learns biologically meaningful representations of genomic features like exon-intron boundaries and transcription factor binding sites

Editorial Opinion

Evo 2 represents a watershed moment in computational biology, demonstrating that foundation models can achieve meaningful biological understanding at unprecedented scale. The decision to fully open-source both the model and the 9 trillion base pair training dataset is particularly commendable, potentially democratizing access to cutting-edge genomic AI tools. However, the real test will be whether Evo 2's predictions translate consistently to experimental validation across diverse biological contexts, and whether the research community can responsibly navigate the dual-use implications of AI-powered genome design capabilities.

Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

Key Takeaways

▸Evo 2 is trained on 9 trillion DNA base pairs with a 1 million token context window, covering all domains of life at single-nucleotide resolution
▸The model predicts functional impacts of genetic variations, including pathogenic mutations and BRCA1 variants, without task-specific fine-tuning
▸Evo 2 generates genome-scale sequences for mitochondrial, prokaryotic, and eukaryotic organisms with experimentally validated results

Summary

Full open-source release includes model weights, training code, inference code, and the OpenGenome2 dataset
Mechanistic analysis shows the model learns biologically meaningful representations of genomic features like exon-intron boundaries and transcription factor binding sites

Editorial Opinion

Evo 2 represents a watershed moment in computational biology, demonstrating that foundation models can achieve meaningful biological understanding at unprecedented scale. The decision to fully open-source both the model and the 9 trillion base pair training dataset is particularly commendable, potentially democratizing access to cutting-edge genomic AI tools. However, the real test will be whether Evo 2's predictions translate consistently to experimental validation across diverse biological contexts, and whether the research community can responsibly navigate the dual-use implications of AI-powered genome design capabilities.

Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

Key Takeaways

Summary

Editorial Opinion

More from Arc Institute

Stanford Researchers Reverse Age-Related Memory Loss by Targeting Gut-Brain Communication

Evo 2: Open-Source AI Trained on Trillions of DNA Bases Can Decode Complex Genomes

AI Models Can Now Generate Entire Genome Sequences, But Synthetic Life Remains Distant

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale

Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

Key Takeaways

Summary

Editorial Opinion

More from Arc Institute

Stanford Researchers Reverse Age-Related Memory Loss by Targeting Gut-Brain Communication

Evo 2: Open-Source AI Trained on Trillions of DNA Bases Can Decode Complex Genomes

AI Models Can Now Generate Entire Genome Sequences, But Synthetic Life Remains Distant

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

MouseMapper: AI Foundation Model Maps Systemic Damage from Obesity at Whole-Body Scale