BotBeat
...
← Back

> ▌

Independent ResearchIndependent Research
RESEARCHIndependent Research2026-05-12

Silent-Bench Exposes Critical Silent Failures in LLM API Gateways—47.96% Error Rates vs. 1.89% on Direct APIs

Key Takeaways

  • ▸Commercial LLM API gateways show error rates up to 46 percentage points higher than direct API calls, indicating silent failures in the routing layer rather than upstream models
  • ▸Silent-Bench provides cryptographically-attested forensic auditing using Merkle trees and Ed25519 signatures, allowing independent verification of API behavior without trusting the auditor
  • ▸Detected failures include response format errors (47.96% vs. 1.89%), token billing inflation (~55%), and silent behavior changes across model deployments
Source:
Hacker Newshttps://doi.org/10.5281/zenodo.20128451↗

Summary

A new cryptographically-audited research framework called Silent-Bench has revealed that commercial LLM API gateways are producing silent failures—requests that appear successful at the HTTP layer but return semantically broken content—at rates dramatically higher than direct API calls to upstream models. In a case study of one unnamed gateway (Proxy-A), the error rate reached 47.96% for certain parameter configurations on one model, compared to just 1.89% when the identical request was sent directly to the upstream provider's API, a gap of approximately 46 percentage points.

The research, conducted by independent researcher Wesam H. Al-Sabban and published with cryptographic attestation, introduces a methodology for detecting and verifying such failures beyond vendor pushback. The framework combines parameter-space sweeps, invisibility scans for hidden behaviors like token-billing inflation, and Merkle-tree hashing with Ed25519 signatures so that any third party can verify the findings independently. Case studies document failures in gateway routing layers, token-billing inflation of ~55% in one deployment, and cross-model effect isolation techniques.

The author has committed to a 90-day coordinated disclosure window, with vendor names anonymized until August 10, 2026. The framework and source code will be released publicly on GitHub by May 26, 2026, under Apache-2.0 license. The research also documents methodological learnings, including the 'small-sample artifact pattern,' where effect sizes estimated on fewer than 10 samples per condition are systematically inflated.

  • Framework will be open-sourced under Apache-2.0 with full reproduction commands and verification protocols; vendor identities to be disclosed August 10, 2026 under standard coordinated disclosure

Editorial Opinion

Silent-Bench addresses a critical blind spot in LLM deployment infrastructure: the assumption that HTTP-level success guarantees semantic correctness. By combining causal ablation, cryptographic proof, and methodological rigor (including documented retractions), Al-Sabban sets a gold standard for infrastructure auditing in the era of API-dependent AI systems. This research is essential reading for any organization routing production traffic through third-party LLM gateways, and the public release of the framework should become a baseline expectation for gateway providers to invite independent audit.

Data Science & AnalyticsMLOps & InfrastructureCybersecurityAI Safety & Alignment

More from Independent Research

Independent ResearchIndependent Research
RESEARCH

Program Synthesis Enables Interpretable Explanations of Transformer Attention Mechanisms

2026-06-18
Independent ResearchIndependent Research
RESEARCH

HRM-Text Achieves Competitive LLM Performance With 100-900x Fewer Training Tokens

2026-06-17
Independent ResearchIndependent Research
RESEARCH

Researchers Develop 'Anti-Slopping' Framework to Eliminate Repetitive LLM Output Patterns

2026-06-15

Comments

Suggested

KlueKlue
POLICY & REGULATION

Klue OAuth Breach Expands: Icarus Hackers Claim Attack, Multiple Tech Firms Affected

2026-06-20
InceptionInception
PRODUCT LAUNCH

Inception Unveils Mercury 2: Parallel-Token Diffusion Models Reshape LLM Performance Economics

2026-06-20
AnthropicAnthropic
FUNDING & BUSINESS

Nobel Prize-Winning AlphaFold Pioneer Departs Google DeepMind for Anthropic

2026-06-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us