ClickHouse Redesigns Full-Text Search Index for Object Storage Performance
Key Takeaways
- ▸ClickHouse redesigned its full-text index to optimize for object storage constraints, prioritizing sequential access over random reads
- ▸The new index design maintains high performance on both object storage and local disks through careful architectural decisions
- ▸The index consists of three components: dictionary file, sparse dictionary index file, and posting list file, each stored as separate files per data part
Summary
ClickHouse has redesigned its full-text search index to deliver high performance when data is stored on object storage rather than local disks. The new index design prioritizes sequential access patterns over random reads, addressing the fundamental performance differences between remote object storage and local disk storage. The redesigned index consists of three main components: a dictionary file, a sparse dictionary index file, and a posting list file, each stored separately per data part.
The engineering team identified that latency, rather than bandwidth, is the real bottleneck when working with remote object storage. The previous text index design relied on scattered lookup patterns that were efficient on local disks but became slow on object storage due to amplified latency from many small, disjoint reads. The new layout enables efficient full-text search on object storage while maintaining performance on local disks, allowing ClickHouse Cloud users to leverage native text indexing capabilities without performance degradation.
- Latency, not bandwidth, is the primary bottleneck when data lives on remote object storage, driving the shift away from random lookup patterns
Editorial Opinion
This redesign represents practical engineering that acknowledges the reality of cloud-native databases—the performance characteristics of object storage are fundamentally different from local disks, and architectural decisions must reflect those constraints. By rethinking the index layout for sequential access while maintaining local disk performance, ClickHouse demonstrates how to build truly cloud-optimized analytics infrastructure. This work should serve as a blueprint for other database systems grappling with similar challenges as they transition to cloud deployments.



