.genome: New Open File Format Designed for AI to Read Human Genomes
Key Takeaways
- ▸.genome is purpose-built for AI interpretation of genomic data, addressing the 15-year gap since VCF was designed for specialist tools rather than AI systems
- ▸The format separates variant data, interpretations, and decision rules into explicit, typed, versioned, and queryable components, eliminating implicit meaning that leads to AI hallucinations
- ▸Deterministic answers are guaranteed—rules are written once at pipeline time against version-pinned guideline lists, not approximated by models at inference
Summary
Anthropic has released .genome/1.0, an open-source file format specification designed specifically for AI systems to read and interpret consumer genome data, along with readmygenome.md, a Claude skill that enables Claude instances to parse .genome bundles correctly. Unlike the existing VCF (Variant Call Format) standard from 2011—which was designed for specialist tools and operators—.genome explicitly separates variant data, interpretations, and decision rules into typed, versioned, and queryable components that AI can reliably process.
The new format addresses fundamental limitations in how AI systems currently interact with genomic data. Traditional VCF files rely on external context and implicit meaning that AI models must guess at, leading to potential misinterpretations of pathogenicity, effect sizes, pharmacogenomic phenotypes, and actionability flags. .genome eliminates these ambiguities by making all rules, thresholds, and guideline references explicit and deterministic—computed once at pipeline time rather than being approximated by the model at inference time.
.genome bundles offer formal correctness guarantees, delivering zero format-induced error for any query expressible over the schema's fields. The format also provides practical benefits including millisecond-scale gene-scoped queries, significantly smaller file sizes than annotated VCFs, and compatibility with standard tools like Parquet readers across Python, JavaScript, SQL, and browsers. Both the specification and the Claude skill are Apache-2.0 licensed and available on GitHub.
- Practical advantages include formal correctness guarantees, faster queries (milliseconds vs. seconds), smaller file sizes, and universal compatibility with existing data tools
Editorial Opinion
.genome represents a maturation in how specialized data formats adapt to AI workflows. Rather than expecting models to reverse-engineer meaning from formats designed for human specialists, this approach treats AI as a first-class reader deserving explicit, queryable semantics. The emphasis on deterministic, rule-based computation over model approximation is particularly important for genomics, where hallucinated pathogenicity claims or misinterpreted pharmacogenomic phenotypes could have real health consequences. If adopted widely, .genome could become a template for how other clinical and scientific data formats should evolve for the AI era.



