Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul
Key Takeaways
- ▸14× average performance improvement: 70–230 docs/sec vs. 5–11 docs/sec on identical hardware
- ▸Switched from SentenceTransformers/Candle to ONNX Runtime backend for better concurrency and resource utilization
- ▸Single-row insert latency reduced from 200+ ms to ~14 ms; concurrent latency improved to ~56 ms
Summary
Manticore has delivered a major performance update to its Auto Embeddings feature, achieving approximately 14× speed improvement through a complete rebuild of how the system handles ONNX (Open Neural Network Exchange) models. The new ONNX Runtime backend, released in Manticore Search 27.1.5, replaces the previous SentenceTransformers/Candle inference path and dramatically reduces embedding latency and increases throughput.
The previous implementation struggled with concurrency bottlenecks and inefficient batching, limiting performance to just 5–11 documents per second regardless of hardware configuration or workload pattern. The optimized path now delivers 70–230 docs per second on the same 16-core hardware, with single-row insert latency dropping from 200+ ms to approximately 14 ms under single-client conditions and 56 ms under concurrent load.
The engineering effort focused on two critical optimizations: disabling intra-operation spinning in the runtime and eliminating inefficient document batching inside the worker process. These changes enabled better CPU utilization and thread scheduling, allowing the system to maintain peak performance across different concurrency levels. Peak throughput reaches 233 docs/sec with a single client thread and batch size of 64.
For database applications relying on embeddings for semantic search, this update significantly improves ingest throughput and responsiveness. Since auto-embeddings run directly within the database on every INSERT, embedding speed directly translates to INSERT speed—a critical metric for high-volume data ingestion workloads. Existing tables automatically benefit from the performance gains with no configuration changes required.
- Zero breaking changes—existing tables using ONNX-capable models automatically use the faster path
- Released in Manticore Search 27.1.5; compatible with standard HuggingFace embedding models (MiniLM, BGE, E5, etc.)



