Seismograph: Open-Source Tool Detects Claude API Drift 38 Days Before Anthropic's Postmortem
Key Takeaways
- ▸Achieved 38-day lead time over provider postmortem during production outage—detected semantic drift while still in 0.8% error phase, before escalation became user-visible
- ▸Privacy-preserving architecture (differential privacy, cryptographic signing, quorum requirements) enables decentralized monitoring without exposing raw model inputs or outputs
- ▸Fills critical observability gap: conventional metrics miss semantic drift; only content-aware statistical monitoring can detect silent behavior changes on same endpoints
Summary
Seismograph is an open-source early-warning system that detects silent behavioral changes (semantic drift) in third-party LLM APIs—a critical blind spot in conventional monitoring. The tool continuously probes OpenAI-compatible endpoints using privacy-preserving statistical analysis, reducing responses to SHA-256 hashes and differentially-private aggregates that never expose raw prompts or outputs. In a reproducible backtest, Seismograph detected an Anthropic Claude Sonnet 4 degradation on August 10, 2025, caused by a 0.8% context-window misrouting error—38 days before Anthropic published its official postmortem on September 17 and 19 days before the issue became visible to users.
Built with Python, FastAPI, and statistical change-point detection (CUSUM + Bayesian online algorithms), Seismograph addresses a fundamental asymmetry in AI infrastructure: standard monitoring (latency, uptime, error rates) cannot detect semantic drift because models return HTTP 200 while subtly degrading outputs—shifted JSON fidelity, altered response distributions, changed reasoning patterns. Every probe batch is Ed25519-signed, and alerts require cross-observer quorum to prevent false alarms from noisy individual probes. The project ships with full CI (122 tests), Apache-2.0 license, Python SDK, and a live public dashboard displaying real-time drift detection across four production models.
- Open-source, immediately deployable tool with Python SDK and live dashboard; demonstrates feasibility of distributed, trust-minimized auditing for third-party AI APIs



