BotBeat
...
← Back

> ▌

ManticMantic
UPDATEMantic2026-07-03

Manticore Search Achieves 14× Faster Embeddings with ONNX Runtime Overhaul

Key Takeaways

  • ▸14× average performance improvement: 70–230 docs/sec vs. 5–11 docs/sec on identical hardware
  • ▸Switched from SentenceTransformers/Candle to ONNX Runtime backend for better concurrency and resource utilization
  • ▸Single-row insert latency reduced from 200+ ms to ~14 ms; concurrent latency improved to ~56 ms
Source:
Hacker Newshttps://manticoresearch.com/blog/onnx-embeddings-speedup/↗

Summary

Manticore has delivered a major performance update to its Auto Embeddings feature, achieving approximately 14× speed improvement through a complete rebuild of how the system handles ONNX (Open Neural Network Exchange) models. The new ONNX Runtime backend, released in Manticore Search 27.1.5, replaces the previous SentenceTransformers/Candle inference path and dramatically reduces embedding latency and increases throughput.

The previous implementation struggled with concurrency bottlenecks and inefficient batching, limiting performance to just 5–11 documents per second regardless of hardware configuration or workload pattern. The optimized path now delivers 70–230 docs per second on the same 16-core hardware, with single-row insert latency dropping from 200+ ms to approximately 14 ms under single-client conditions and 56 ms under concurrent load.

The engineering effort focused on two critical optimizations: disabling intra-operation spinning in the runtime and eliminating inefficient document batching inside the worker process. These changes enabled better CPU utilization and thread scheduling, allowing the system to maintain peak performance across different concurrency levels. Peak throughput reaches 233 docs/sec with a single client thread and batch size of 64.

For database applications relying on embeddings for semantic search, this update significantly improves ingest throughput and responsiveness. Since auto-embeddings run directly within the database on every INSERT, embedding speed directly translates to INSERT speed—a critical metric for high-volume data ingestion workloads. Existing tables automatically benefit from the performance gains with no configuration changes required.

  • Zero breaking changes—existing tables using ONNX-capable models automatically use the faster path
  • Released in Manticore Search 27.1.5; compatible with standard HuggingFace embedding models (MiniLM, BGE, E5, etc.)
Natural Language Processing (NLP)Machine LearningMLOps & InfrastructureOpen Source

More from Mantic

ManticMantic
OPEN SOURCE

SemanticForge: Open-Source Framework Enables Communities to Define and Verify AI Values Across Cultures

2026-04-18
ManticMantic
RESEARCH

Mantic Demonstrates Fine-Tuned LLMs Outperform Frontier Models in Geopolitical Forecasting

2026-03-23
ManticMantic
RESEARCH

Mantic Achieves Superforecaster-Level Accuracy by Fine-Tuning LLMs with Reinforcement Learning

2026-03-20

Comments

Suggested

IntelIntel
FUNDING & BUSINESS

Yann LeCun's AMI Labs Raises $1 Billion to Develop Post-LLM AI Architecture

2026-07-03
MetaMeta
INDUSTRY REPORT

Open Source LLMs Now Account for One-Third of All Token Volume, Report Finds

2026-07-03
AnthropicAnthropic
UPDATE

Anthropic Introduces Advanced Analytics and Cost Controls for Claude Enterprise

2026-07-03
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us