BotBeat
...
← Back

> ▌

AnthropicAnthropic
OPEN SOURCEAnthropic2026-07-03

Semantic Manifest Enables ClaudeBot to Ingest 58,000-Page Site at 7 URLs Per Second

Key Takeaways

  • ▸Semantic Manifest uses NDJSON streaming to allow AI crawlers to efficiently parse entire sites with structural context, solving scalability problems with sitemaps, JSON-LD, and llms.txt
  • ▸ClaudeBot demonstrated ~7 URLs/second ingestion on a 58,000-page site, showing real-world performance gains over legacy web standards
  • ▸The specification is open-source (CC0 licensed) and already has production implementations, designed to benefit the entire AI ecosystem rather than a single company
Source:
Hacker Newshttps://github.com/CKL75/semantic-manifest-specification↗

Summary

An open specification called Semantic Manifest has been released to address fundamental limitations in how AI crawlers and large language models discover and ingest web content. Traditional web standards like sitemaps, JSON-LD, and llms.txt files are inefficient for AI at scale—sitemaps lack structural context, JSON-LD is page-scoped, and flat text files consume enormous context window tokens. Semantic Manifest uses a streamable NDJSON format that allows AI systems to parse an entire site's content types, relational entities, and structured markdown representations line-by-line with minimal overhead.

A production deployment demonstrated the specification's efficiency when Anthropic's ClaudeBot ingested an ~58,000-page site at approximately 7 URLs per second within hours of the manifest being linked in the page header. The specification is now in v0.1 and includes reference implementations from projects like EduStats (58,000 pages) and the Hypersonic SEO Framework. Created by Chris Limner and released under CC0 1.0 Universal, Semantic Manifest represents an open infrastructure standard designed to improve how the entire AI crawler ecosystem—including LLMs and RAG engines—consume web-scale content.

  • By reducing context window consumption and improving crawler efficiency, Semantic Manifest could become foundational infrastructure for how AI systems discover and ingest web content at scale
Generative AIMLOps & InfrastructureOpen Source

More from Anthropic

AnthropicAnthropic
INDUSTRY REPORT

Independent Analysis Reveals True Token Costs and Usage Limits Behind Leading Coding Agent Plans

2026-07-03
AnthropicAnthropic
UPDATE

Claude Fable Relaunch Disappoints Users With Stricter Safety Guardrails and Usage Restrictions

2026-07-03
AnthropicAnthropic
UPDATE

Anthropic Introduces Advanced Analytics and Cost Controls for Claude Enterprise

2026-07-03

Comments

Suggested

Azerbaijan Technical UniversityAzerbaijan Technical University
RESEARCH

Researchers Develop Real-Time Hallucination Detection for Edge-Deployed Language Models

2026-07-03
AnthropicAnthropic
INDUSTRY REPORT

Independent Analysis Reveals True Token Costs and Usage Limits Behind Leading Coding Agent Plans

2026-07-03
Corvin LabsCorvin Labs
PRODUCT LAUNCH

CorvinOS Launches Self-Hosted Agentic OS with EU AI Act 2026 Compliance Built Into Architecture

2026-07-03
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us