Security Researcher Discovers GPT-4 Leaks API Credentials Through Training Data Exposure

Key Takeaways

▸GPT-4 leaks the EPHEMERAL_KEY credential at a 75% rate when prompted about secrets or initialization, despite attempting refusals
▸The vulnerability is caused by OpenAI API documentation existing in training data, making real credentials the highest-probability output for security-related queries
▸Refusal training exacerbates the problem by training the model to reference real examples from its corpus when declining to disclose information

Source:

Hacker Newshttps://news.ycombinator.com/item?id=47327833↗

Summary

A security researcher has identified a critical vulnerability in GPT-4 where the model repeatedly leaks internal API credentials—specifically an "EPHEMERAL_KEY" from OpenAI's Realtime API—through training data exposure. The researcher conducted the same security test four times with different prompts and achieved a 75% leak rate, with each test returning references to the same credential despite the model's attempted refusals to disclose it. The vulnerability stems from OpenAI's API documentation being present in GPT-4's training data, causing the model to associate security-related queries with real examples from its corpus rather than generating fictional credentials.

The researcher attributes the problem to the model's refusal training methodology, where the system learns to say "I cannot disclose [example secret]" using actual examples from its training data. This creates a systemic issue that cannot be patched without complete model retraining and affects all models trained on API documentation. The vulnerability demonstrates a concerning exploit path where attackers can discover credential names, probe for generation patterns, and potentially target client-side implementations for session hijacking. The cost to discover this vulnerability was minimal—approximately $0.04 through 60 tests—highlighting the accessibility of such security flaws.

The flaw is systemic and cannot be fixed through patching alone—it requires complete model retraining and affects all models trained on API documentation
This represents a scalable attack vector with minimal discovery cost, potentially enabling session hijacking and other downstream exploits

Editorial Opinion

This vulnerability exposes a fundamental tension in AI safety: refusal training that references real examples from training data may inadvertently become a disclosure mechanism rather than a protection mechanism. It highlights the urgent need for more sophisticated approaches to handling sensitive information during model training and the development of post-training techniques that don't rely on example-based refusals. As LLMs become increasingly integrated with APIs and services, the industry must establish better standards for what information should be included in training corpora and how to handle it securely.

Security Researcher Discovers GPT-4 Leaks API Credentials Through Training Data Exposure

Key Takeaways

▸GPT-4 leaks the EPHEMERAL_KEY credential at a 75% rate when prompted about secrets or initialization, despite attempting refusals
▸The vulnerability is caused by OpenAI API documentation existing in training data, making real credentials the highest-probability output for security-related queries
▸Refusal training exacerbates the problem by training the model to reference real examples from its corpus when declining to disclose information

Summary

The flaw is systemic and cannot be fixed through patching alone—it requires complete model retraining and affects all models trained on API documentation
This represents a scalable attack vector with minimal discovery cost, potentially enabling session hijacking and other downstream exploits

Editorial Opinion

This vulnerability exposes a fundamental tension in AI safety: refusal training that references real examples from training data may inadvertently become a disclosure mechanism rather than a protection mechanism. It highlights the urgent need for more sophisticated approaches to handling sensitive information during model training and the development of post-training techniques that don't rely on example-based refusals. As LLMs become increasingly integrated with APIs and services, the industry must establish better standards for what information should be included in training corpora and how to handle it securely.

Security Researcher Discovers GPT-4 Leaks API Credentials Through Training Data Exposure

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

Security Researcher Discovers GPT-4 Leaks API Credentials Through Training Data Exposure

Key Takeaways

Summary

Editorial Opinion

More from OpenAI

OpenAI Prepares for IPO After Musk Lawsuit Threat Clears

OpenAI Model Solves 80-Year-Old Planar Unit Distance Problem, Disproving Long-Held Mathematical Assumption

OpenAI Prepares to File to Go Public in Coming Weeks

Comments

Suggested

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says