PageIndex Scales to Millions of Documents with New File System
Key Takeaways
- ▸PageIndex File System launches for enterprise, enabling vectorless retrieval across millions of documents
- ▸Tree-based architecture overcomes core limitations of vector embeddings: semantic continuity and relevance vs. similarity gaps
- ▸LLMs navigate document hierarchy for reasoning-based retrieval rather than cosine similarity ranking
Summary
PageIndex, an open-source vectorless RAG framework that has grown to 26,000+ GitHub stars in just months, announced the PageIndex File System today. The new capability enables enterprises to scale retrieval across millions of documents by representing them as hierarchical trees rather than embedding vectors. Available immediately for enterprise customers with cloud rollout coming later this month, the File System directly addresses fundamental limitations of traditional vector-based RAG systems that break down at scale.
Traditional vector-based RAG suffers from two critical failure modes: embeddings have limited representation power and sacrifice semantic continuity when chunking long documents, and cosine similarity is a poor proxy for actual relevance in domains like legal, medical, and financial services. PageIndex's tree-based approach lets LLMs navigate document structure to find answers based on reasoning and context rather than surface-level similarity. The framework currently serves 23,000+ production users and was recently selected for the GitHub Secure Open Source Fund, positioning it as a major force in AI infrastructure.
- Product achieves 26,000+ GitHub stars, serves 23,000+ cloud users, and ranked #1 on GitHub Trending



