BotBeat
...
← Back

> ▌

Arc InstituteArc Institute
PRODUCT LAUNCHArc Institute2026-03-05

Evo 2: Open-Source AI Trained on Trillions of DNA Bases Can Decode Complex Genomes

Key Takeaways

  • ▸Evo 2 is trained on 8.8 trillion DNA bases from bacteria, archaea, and eukaryotes, making it capable of analyzing complex genomes including human DNA
  • ▸The system can identify subtle genomic features like splice sites and regulatory sequences that existing tools struggle to detect accurately
  • ▸Training involved two stages: initial 8,000-base chunks for feature learning, then million-base sequences for large-scale pattern recognition
Source:
Hacker Newshttps://arstechnica.com/science/2026/03/large-genome-model-open-source-ai-trained-on-trillions-of-bases/↗

Summary

Researchers have released Evo 2, an open-source AI system trained on 8.8 trillion DNA bases spanning all three domains of life—bacteria, archaea, and eukaryotes. Building on its predecessor Evo, which focused on bacterial genomes, Evo 2 tackles the significantly more complex task of interpreting eukaryotic genomes like those of humans. The system can identify genes, regulatory sequences, splice sites, and other genomic features that are often challenging even for specialized bioinformatics tools to detect accurately.

Evo 2 uses a convolutional neural network architecture called StripedHyena 2 and was trained in two stages using the OpenGenome2 dataset. The initial training phase focused on 8,000-base chunks to teach feature recognition, followed by a second phase processing sequences up to one million bases long to identify large-scale genomic patterns. Unlike bacterial genomes with their straightforward gene organization, eukaryotic genomes contain interrupted coding sequences (introns), scattered regulatory elements, and vast amounts of non-coding DNA, making pattern recognition extraordinarily difficult.

The AI developed internal representations of key genomic features including weakly-defined sequences like splice sites and transcription factor binding sites, which have probabilistic rather than absolute base requirements. Notably, the researchers excluded viruses that attack eukaryotes from the training data due to biosafety concerns about potential misuse for creating human pathogens. The release of Evo 2 as open-source software represents a significant advancement in computational genomics, potentially accelerating genome annotation, comparative genomics, and our understanding of gene regulation across the tree of life.

  • The model is released as open-source, though training data excluded eukaryotic viruses to prevent potential biosecurity risks

Editorial Opinion

Evo 2 represents a watershed moment in computational biology, demonstrating that foundation models can tackle the messy complexity of real-world genomes rather than just the tidy patterns found in bacterial DNA. The decision to release this powerful tool as open-source while thoughtfully excluding potentially dangerous viral sequences shows responsible innovation in action. This could democratize advanced genome analysis capabilities that were previously accessible only to well-funded institutions, potentially accelerating discoveries in medicine, agriculture, and evolutionary biology across the research community.

Machine LearningDeep LearningHealthcareScience & ResearchOpen Source

More from Arc Institute

Arc InstituteArc Institute
RESEARCH

Stanford Researchers Reverse Age-Related Memory Loss by Targeting Gut-Brain Communication

2026-03-12
Arc InstituteArc Institute
RESEARCH

AI Models Can Now Generate Entire Genome Sequences, But Synthetic Life Remains Distant

2026-03-05
Arc InstituteArc Institute
RESEARCH

Evo 2: AI Foundation Model Trained on 9 Trillion DNA Base Pairs Achieves Genome-Scale Design Across All Life

2026-03-05

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us