BotBeat
...
← Back

> ▌

OpenAIOpenAI
RESEARCHOpenAI2026-03-10

Security Researcher Discovers GPT-4 Leaks API Credentials Through Training Data Exposure

Key Takeaways

  • ▸GPT-4 leaks the EPHEMERAL_KEY credential at a 75% rate when prompted about secrets or initialization, despite attempting refusals
  • ▸The vulnerability is caused by OpenAI API documentation existing in training data, making real credentials the highest-probability output for security-related queries
  • ▸Refusal training exacerbates the problem by training the model to reference real examples from its corpus when declining to disclose information
Source:
Hacker Newshttps://news.ycombinator.com/item?id=47327833↗

Summary

A security researcher has identified a critical vulnerability in GPT-4 where the model repeatedly leaks internal API credentials—specifically an "EPHEMERAL_KEY" from OpenAI's Realtime API—through training data exposure. The researcher conducted the same security test four times with different prompts and achieved a 75% leak rate, with each test returning references to the same credential despite the model's attempted refusals to disclose it. The vulnerability stems from OpenAI's API documentation being present in GPT-4's training data, causing the model to associate security-related queries with real examples from its corpus rather than generating fictional credentials.

The researcher attributes the problem to the model's refusal training methodology, where the system learns to say "I cannot disclose [example secret]" using actual examples from its training data. This creates a systemic issue that cannot be patched without complete model retraining and affects all models trained on API documentation. The vulnerability demonstrates a concerning exploit path where attackers can discover credential names, probe for generation patterns, and potentially target client-side implementations for session hijacking. The cost to discover this vulnerability was minimal—approximately $0.04 through 60 tests—highlighting the accessibility of such security flaws.

  • The flaw is systemic and cannot be fixed through patching alone—it requires complete model retraining and affects all models trained on API documentation
  • This represents a scalable attack vector with minimal discovery cost, potentially enabling session hijacking and other downstream exploits

Editorial Opinion

This vulnerability exposes a fundamental tension in AI safety: refusal training that references real examples from training data may inadvertently become a disclosure mechanism rather than a protection mechanism. It highlights the urgent need for more sophisticated approaches to handling sensitive information during model training and the development of post-training techniques that don't rely on example-based refusals. As LLMs become increasingly integrated with APIs and services, the industry must establish better standards for what information should be included in training corpora and how to handle it securely.

Large Language Models (LLMs)CybersecurityAI Safety & AlignmentPrivacy & Data

More from OpenAI

OpenAIOpenAI
INDUSTRY REPORT

AI Chatbots Are Homogenizing College Classroom Discussions, Yale Students Report

2026-04-05
OpenAIOpenAI
FUNDING & BUSINESS

OpenAI Announces Executive Reshuffle: COO Lightcap Moves to Special Projects, Simo Takes Medical Leave

2026-04-04
OpenAIOpenAI
PARTNERSHIP

OpenAI Acquires TBPN Podcast to Control AI Narrative and Reach Influential Tech Audience

2026-04-04

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
PerplexityPerplexity
POLICY & REGULATION

Perplexity's 'Incognito Mode' Called a 'Sham' in Class Action Lawsuit Over Data Sharing with Google and Meta

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us