BotBeat
...
← Back

> ▌

Arc InstituteArc Institute
RESEARCHArc Institute2026-03-05

Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

Key Takeaways

  • ▸Evo 2 is trained on 9 trillion DNA base pairs with a 1 million token context window, covering all domains of life at single-nucleotide resolution
  • ▸The model predicts functional impacts of genetic variations, including pathogenic mutations and BRCA1 variants, without task-specific fine-tuning
  • ▸Evo 2 generates genome-scale sequences for mitochondrial, prokaryotic, and eukaryotic organisms with experimentally validated results
Source:
Hacker Newshttps://www.nature.com/articles/s41586-026-10176-5↗

Summary

Researchers have unveiled Evo 2, a biological foundation model trained on 9 trillion DNA base pairs from a comprehensive genomic atlas spanning all domains of life. Published in Nature, the model features a 1 million token context window with single-nucleotide resolution, enabling unprecedented capabilities in predicting functional impacts of genetic variation and generating novel genomic sequences. The model demonstrates the ability to accurately predict effects of genetic changes—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without requiring task-specific fine-tuning.

Evo 2's mechanistic interpretability analyses reveal sophisticated biological understanding, with learned representations associated with exon-intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. The model's generative capabilities extend to producing mitochondrial, prokaryotic, and eukaryotic sequences at genome scale, showing greater naturalness and coherence than previous methods. When guided by predictive models and inference-time search, Evo 2 successfully generates experimentally validated chromatin accessibility patterns.

In a significant move for scientific collaboration, the research team has made Evo 2 fully open source, releasing model parameters, training code, inference code, and the OpenGenome2 dataset. This comprehensive release aims to accelerate exploration and design of biological complexity across the research community. The model represents a major advancement in applying artificial intelligence to genomics, potentially transforming how researchers understand and engineer biological systems across all forms of life.

  • Full open-source release includes model weights, training code, inference code, and the OpenGenome2 dataset
  • Mechanistic analysis shows the model learns biologically meaningful representations of genomic features like exon-intron boundaries and transcription factor binding sites

Editorial Opinion

Evo 2 represents a watershed moment in computational biology, demonstrating that foundation models can achieve meaningful biological understanding at unprecedented scale. The decision to fully open-source both the model and the 9 trillion base pair training dataset is particularly commendable, potentially democratizing access to cutting-edge genomic AI tools. However, the real test will be whether Evo 2's predictions translate consistently to experimental validation across diverse biological contexts, and whether the research community can responsibly navigate the dual-use implications of AI-powered genome design capabilities.

Large Language Models (LLMs)Machine LearningHealthcareScience & ResearchOpen Source

More from Arc Institute

Arc InstituteArc Institute
RESEARCH

Stanford Researchers Reverse Age-Related Memory Loss by Targeting Gut-Brain Communication

2026-03-12
Arc InstituteArc Institute
PRODUCT LAUNCH

Evo 2: Open-Source AI Trained on Trillions of DNA Bases Can Decode Complex Genomes

2026-03-05
Arc InstituteArc Institute
RESEARCH

AI Models Can Now Generate Entire Genome Sequences, But Synthetic Life Remains Distant

2026-03-05

Comments

Suggested

GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
Sweden Polytechnic InstituteSweden Polytechnic Institute
RESEARCH

Research Reveals Brevity Constraints Can Improve LLM Accuracy by Up to 26.3%

2026-04-05
Research CommunityResearch Community
RESEARCH

TELeR: New Taxonomy Framework for Standardizing LLM Prompt Benchmarking on Complex Tasks

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us