OpenData Vector: MIT-Licensed, Stateless Vector Search Engine Launches with $350/Month Economics
Key Takeaways
- ▸Third-generation stateless vector database architecture enables any node to serve any data without local state or inter-node coordination
- ▸MIT-licensed open-source alternative priced at ~$350/month for 100M vectors, significantly undercutting proprietary vendors
- ▸Optimized for object storage (S3, etc.) with batched index fetching to compensate for high latency and expensive GET requests
Summary
OpenData Vector, a new MIT-licensed vector search engine, is positioning itself as a cost-effective alternative to proprietary vector database vendors. Built on SlateDB, the system is designed around a stateless, third-generation architecture that leverages object storage for durability and consistency, eliminating the operational complexity of managing stateful database nodes.
The architecture enables OpenData Vector to serve 100 million vectors for approximately $350 per month, a fraction of the cost of commercial solutions. It achieves this through three key design decisions: inverted-file (IVF) indexing optimized for object storage's high latency, LIRE compaction on an LSM tree, and a share-everything state model where any node can serve any data without local shard assignment or cluster coordination.
OpenData Vector fills a gap between self-hosted pgvector deployments and expensive managed vector database services. The stateless approach simplifies operations by delegating metadata management, durability, and replication entirely to object storage providers like S3, addressing a shift in the database community toward treating object storage as a reliable, cost-effective foundation for online systems.
- Designed to be operationally simple enough for self-hosting while maintaining competitive performance with managed services
Editorial Opinion
OpenData Vector represents a meaningful shift toward commodity, stateless vector infrastructure that could democratize access to high-performance semantic search. By building on object storage as the source of truth rather than as a tiered cache, the project sidesteps operational complexity that has traditionally locked customers into managed services. If performance meets claims on real workloads, this could substantially challenge the economic moat of existing vector database vendors.



