Ghost Couples: Study Reveals How LLMs Generate Recurring Fictional Authors That Contaminate Academic Publishing

Key Takeaways

▸LLMs generate correlated character ensembles whose co-occurrence rates far exceed chance and remain consistent across independent generations
▸These ghost author patterns are model-family-specific and version-specific, creating detectable temporal fingerprints of model development
▸Over 1,655 ghost-authored papers with fabricated metadata have been registered on Zenodo with authentic, harvestable DataCite DOIs

Source:

Hacker Newshttps://arxiv.org/abs/2606.02184↗

Summary

A new arXiv research paper reveals that major large language models consistently generate correlated pairs and trios of fictional characters that appear across hundreds of independently produced AI-generated documents. These 'ghost couples' are not random; they are model-family-specific patterns, with Claude reliably producing Elena Vasquez + Marcus Chen + Amara Okafor, Gemini generating Aris Thorne + Lena Petrova, and GPT consistently using Elara Voss. The patterns are version-specific and actively suppressed at model release boundaries, leaving detectable fingerprints in content production timelines.

The research documents a severe real-world consequence at scale: researchers identified 1,655 ghost-authored records on Zenodo, a CERN-operated scholarly repository, with fabricated publication dates. Critically, 991 of these records were registered within a single month, and all carry authentic DataCite DOIs—the digital identifiers that scholarly databases use to index and harvest papers. Server-side timestamps prove deliberate backdating. Ghost names additionally appear on ResearchGate forming synthetic research groups spanning multiple LLM model families, with publication dates serving as reliable temporal proxies for model deployment windows.

Ghost authors are deliberately suppressed at model release boundaries, indicating intentional removal in newer versions
Synthetic papers with real DOIs contaminate scholarly aggregators and academic metadata systems at scale

Editorial Opinion

This research exposes a critical vulnerability in AI deployment: language models leak correlated character priors into persistent scholarly infrastructure with real, harvestable identifiers. The finding that over 1,600 ghost-authored papers now carry authenticated DOIs and infiltrate academic databases is both a technical curiosity and a genuine threat to the integrity of knowledge systems. This phenomenon reveals how AI-generated content can systematically pollute permanent records without detection mechanisms in place. Publishers and repositories must urgently implement stronger validation and AI-detection protocols before ghost authors become indistinguishable from legitimate scholarship.

Ghost Couples: Study Reveals How LLMs Generate Recurring Fictional Authors That Contaminate Academic Publishing

Key Takeaways

▸LLMs generate correlated character ensembles whose co-occurrence rates far exceed chance and remain consistent across independent generations
▸These ghost author patterns are model-family-specific and version-specific, creating detectable temporal fingerprints of model development
▸Over 1,655 ghost-authored papers with fabricated metadata have been registered on Zenodo with authentic, harvestable DataCite DOIs

Summary

Ghost authors are deliberately suppressed at model release boundaries, indicating intentional removal in newer versions
Synthetic papers with real DOIs contaminate scholarly aggregators and academic metadata systems at scale

Editorial Opinion

This research exposes a critical vulnerability in AI deployment: language models leak correlated character priors into persistent scholarly infrastructure with real, harvestable identifiers. The finding that over 1,600 ghost-authored papers now carry authenticated DOIs and infiltrate academic databases is both a technical curiosity and a genuine threat to the integrity of knowledge systems. This phenomenon reveals how AI-generated content can systematically pollute permanent records without detection mechanisms in place. Publishers and repositories must urgently implement stronger validation and AI-detection protocols before ghost authors become indistinguishable from legitimate scholarship.

Ghost Couples: Study Reveals How LLMs Generate Recurring Fictional Authors That Contaminate Academic Publishing

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Claude Chat Privacy Exposure: Anthropic's Search Engine Safeguards Fall Short

Thousands of Claude Conversations with Sensitive Data Found Publicly Searchable on Google

Anthropic's AI Model Solves the 87-Year-Old Jacobian Conjecture

Comments

Suggested

Google Restricts Internal Access to Gemini: AI Model Added to Banned Tools List

Simulation Becomes Core to Physical AI Development: Industry Overview Reveals Multi-Engine Landscape

Nvidia Leads 30+ Company Coalition for Open-Source AI Security, but OpenAI, Google, and Anthropic Notably Absent

Ghost Couples: Study Reveals How LLMs Generate Recurring Fictional Authors That Contaminate Academic Publishing

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Claude Chat Privacy Exposure: Anthropic's Search Engine Safeguards Fall Short

Thousands of Claude Conversations with Sensitive Data Found Publicly Searchable on Google

Anthropic's AI Model Solves the 87-Year-Old Jacobian Conjecture

Comments

Suggested

Google Restricts Internal Access to Gemini: AI Model Added to Banned Tools List

Simulation Becomes Core to Physical AI Development: Industry Overview Reveals Multi-Engine Landscape

Nvidia Leads 30+ Company Coalition for Open-Source AI Security, but OpenAI, Google, and Anthropic Notably Absent