Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul

Key Takeaways

▸14× average performance improvement: 70–230 docs/sec vs. 5–11 docs/sec on identical hardware
▸Switched from SentenceTransformers/Candle to ONNX Runtime backend for better concurrency and resource utilization
▸Single-row insert latency reduced from 200+ ms to ~14 ms; concurrent latency improved to ~56 ms

Source:

Hacker Newshttps://manticoresearch.com/blog/onnx-embeddings-speedup/↗

Summary

Manticore has delivered a major performance update to its Auto Embeddings feature, achieving approximately 14× speed improvement through a complete rebuild of how the system handles ONNX (Open Neural Network Exchange) models. The new ONNX Runtime backend, released in Manticore Search 27.1.5, replaces the previous SentenceTransformers/Candle inference path and dramatically reduces embedding latency and increases throughput.

The previous implementation struggled with concurrency bottlenecks and inefficient batching, limiting performance to just 5–11 documents per second regardless of hardware configuration or workload pattern. The optimized path now delivers 70–230 docs per second on the same 16-core hardware, with single-row insert latency dropping from 200+ ms to approximately 14 ms under single-client conditions and 56 ms under concurrent load.

The engineering effort focused on two critical optimizations: disabling intra-operation spinning in the runtime and eliminating inefficient document batching inside the worker process. These changes enabled better CPU utilization and thread scheduling, allowing the system to maintain peak performance across different concurrency levels. Peak throughput reaches 233 docs/sec with a single client thread and batch size of 64.

For database applications relying on embeddings for semantic search, this update significantly improves ingest throughput and responsiveness. Since auto-embeddings run directly within the database on every INSERT, embedding speed directly translates to INSERT speed—a critical metric for high-volume data ingestion workloads. Existing tables automatically benefit from the performance gains with no configuration changes required.

Zero breaking changes—existing tables using ONNX-capable models automatically use the faster path
Released in Manticore Search 27.1.5; compatible with standard HuggingFace embedding models (MiniLM, BGE, E5, etc.)

Mantic

UPDATE Mantic2026-07-03

Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul

Key Takeaways

▸14× average performance improvement: 70–230 docs/sec vs. 5–11 docs/sec on identical hardware
▸Switched from SentenceTransformers/Candle to ONNX Runtime backend for better concurrency and resource utilization
▸Single-row insert latency reduced from 200+ ms to ~14 ms; concurrent latency improved to ~56 ms

Source:

Hacker Newshttps://manticoresearch.com/blog/onnx-embeddings-speedup/↗

Summary

Zero breaking changes—existing tables using ONNX-capable models automatically use the faster path
Released in Manticore Search 27.1.5; compatible with standard HuggingFace embedding models (MiniLM, BGE, E5, etc.)

Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul

Key Takeaways

Summary

More from Mantic

SemanticForge: Open-Source Framework Enables Communities to Define and Verify AI Values Across Cultures

Mantic Demonstrates Fine-Tuned LLMs Outperform Frontier Models in Geopolitical Forecasting

Mantic Achieves Superforecaster-Level Accuracy by Fine-Tuning LLMs with Reinforcement Learning

Comments

Suggested

Yann LeCun's AMI Labs Raises $1 Billion to Develop Post-LLM AI Architecture

Open Source LLMs Now Account for One-Third of All Token Volume, Report Finds

Anthropic Introduces Advanced Analytics and Cost Controls for Claude Enterprise

Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul

Key Takeaways

Summary

More from Mantic

SemanticForge: Open-Source Framework Enables Communities to Define and Verify AI Values Across Cultures

Mantic Demonstrates Fine-Tuned LLMs Outperform Frontier Models in Geopolitical Forecasting

Mantic Achieves Superforecaster-Level Accuracy by Fine-Tuning LLMs with Reinforcement Learning

Comments

Suggested

Yann LeCun's AMI Labs Raises $1 Billion to Develop Post-LLM AI Architecture

Open Source LLMs Now Account for One-Third of All Token Volume, Report Finds

Anthropic Introduces Advanced Analytics and Cost Controls for Claude Enterprise