BotBeat
...
← Back

> ▌

DatabricksDatabricks
RESEARCHDatabricks2026-05-05

Databricks Scales to 10 Trillion Monitoring Samples Per Day With Custom Infrastructure

Key Takeaways

  • ▸Databricks' monitoring infrastructure now ingests 10 trillion samples daily and tracks 5 billion active timeseries across 70 cloud regions
  • ▸The company built Pantheon, a custom fork of open-source Thanos TSDB, because off-the-shelf solutions couldn't handle scale and complexity requirements
  • ▸Infrastructure improvements reduced monitoring downtime by 5x and eliminated millions in annual cloud costs
Source:
Hacker Newshttps://www.databricks.com/blog/10-trillion-samples-day-scaling-beyond-traditional-monitoring-infra-databricks↗

Summary

Databricks has shared details of its custom monitoring infrastructure platform that now tracks 5 billion active timeseries in real-time and ingests over 10 trillion samples per day—more than triple the scale of a year ago. The company determined that traditional off-the-shelf monitoring solutions were inefficient at its scale, prompting engineers to develop a new platform that leverages the best of open-source monitoring ecosystems while incorporating customizations for Databricks' unique needs.

The core challenges were managing monitoring across roughly 70 cloud regions spanning AWS, Azure, and Google Cloud while maintaining high reliability and supporting exponential growth driven by serverless and AI workloads. Traditional timeseries databases became a critical bottleneck requiring daily scaling operations. Databricks developed Pantheon, a fork of the CNCF's open-source Thanos project, which now powers 160+ instances globally and handles nearly 1,000 PromQL queries per second on the largest deployments.

The new architecture introduced metric aggregation to manage cardinality explosion caused by rapid infrastructure churn and integrated Databricks' lakehouse for dimensional troubleshooting. The migration to Pantheon reduced monitoring infrastructure downtime by 5x, eliminated significant manual operational toil, and saved millions in annual cloud costs. Databricks has contributed performance optimizations and edge case fixes back to the open-source Thanos community.

  • Databricks actively contributes performance optimizations back to the open-source Thanos community

Editorial Opinion

Databricks' decision to build custom infrastructure rather than adopt commercial monitoring solutions reflects a critical threshold in the industry: at hyperscale, generic tools inevitably become a liability. More importantly, the company's commitment to contributing improvements back to Thanos demonstrates how even hyperscale companies can be responsible open-source stewards. As AI infrastructure complexity explodes and data workloads reach planetary scale, expect more companies to follow Databricks' playbook: start with open-source foundations and customize ruthlessly.

Data Science & AnalyticsMLOps & InfrastructureOpen Source

More from Databricks

DatabricksDatabricks
PRODUCT LAUNCH

Databricks Launches LTAP: Unified Data Architecture for the Agentic Era

2026-06-16
DatabricksDatabricks
FUNDING & BUSINESS

Databricks Acquires Panther to Advance Security Lakehouse Vision

2026-06-16
DatabricksDatabricks
PRODUCT LAUNCH

Databricks and Neon Launch Omnigent: A Unified Platform for Managing Multiple AI Agents

2026-06-14

Comments

Suggested

InceptionInception
PRODUCT LAUNCH

Inception Unveils Mercury 2: Parallel-Token Diffusion Models Reshape LLM Performance Economics

2026-06-20
Google / AlphabetGoogle / Alphabet
RESEARCH

Google Automates Model Design for Edge AI, Achieving 45× Speed Improvements on Microcontrollers

2026-06-19
GoDaddyGoDaddy
OPEN SOURCE

Major AI Companies Announce Agentic Resource Discovery Specification (ARD)

2026-06-19
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us