Milvus Shifts Focus: From Performance Optimization to Cost-Efficient Vector Database Architecture
Key Takeaways
- ▸Milvus pivoted after 8 years of pure performance focus to address cost efficiency—reflecting market maturity and real-world deployment constraints
- ▸Zilliz Vector Lakebase introduces hybrid compute: always-on serving for hot data, on-demand instances for cold storage, fundamentally reducing infrastructure costs for sporadic access patterns
- ▸New tiered architecture mirrors traditional database design (hot/cold storage) adapted for vector search, enabling semantic data to support multiple query lifecycles instead of requiring dedicated 24/7 infrastructure
Summary
After eight years optimizing vector database performance, Milvus is fundamentally pivoting toward cost efficiency with Zilliz Vector Lakebase. The new system introduces tiered storage architecture that separates always-on serving for frequently queried collections from on-demand compute for infrequently accessed data—addressing a critical gap in how vector databases handle real-world AI workloads.
The shift reflects changing market realities in AI infrastructure. While traditional Milvus kept entire vector indexes resident in memory across multiple 128GB+ nodes running 24/7, many production workloads don't justify this cost. Product teams running two-week A/B test experiments, SaaS platforms with 90% inactive users, and RAG systems where 80% of documents go unqueried for months all generate embeddings that sit mostly idle. The new architecture enables hot data to stay in memory for low-latency queries while cold data lives in cheaper object storage, loading on-demand when needed.
This represents a maturation of vector database technology. The technical constraints remain unchanged—S3's 20-50ms latency and HNSW graph traversal requirements still demand local memory for high-QPS serving. But the architecture now acknowledges that not all data has the same access pattern, enabling a single platform to serve both latency-critical and cost-optimized workloads simultaneously.
Editorial Opinion
This pragmatic pivot shows architectural wisdom. Vector database vendors rightfully prioritized performance when the technology was unproven, but the market's evolution demands cost consciousness. Acknowledging that not all embeddings need microsecond latencies—especially in short-lived experiments and historical data—could finally make vector databases economically viable for mainstream AI teams drowning in sprawling, rarely-queried embeddings.


