The Ghost Couple: How AI Models Develop Correlated Naming Biases

Key Takeaways

▸LLMs exhibit reproducible, model-family-specific naming biases when generating fictional characters—Claude, Gemini, and GPT each have distinct preferred name ensembles
▸These biases are actively suppressed at model updates, indicating companies are aware of the issue but it persists in deployed models
▸Ghost-authored academic papers with real DOIs are now systematically contaminating scholarly repositories at scale (1,655+ records on Zenodo alone)

Source:

Hacker Newshttps://arxiv.org/abs/2606.02184↗

Summary

A new research paper reveals that large language models don't generate names randomly—they exhibit model-family-specific preferences for certain fictional identities. Claude consistently generates Elena Vasquez, Marcus Chen, and Amara Okafor (the "Ghost Couple" and their partner) as academic collaborators across independent documents; Gemini favors Aris Thorne and Lena Petrova; GPT defaults to Elara Voss. These naming biases are version-specific and leave dateable behavioral fingerprints that can identify which model generated a piece of content.

The research has documented a serious downstream consequence: 1,655 ghost-authored papers now exist on Zenodo (CERN's repository) with fabricated journal names and backdated publication dates. Critically, these records carry real DOIs registered in DataCite, making them harvestable by scholarly aggregators and contaminating the academic literature. The researchers traced 991 records registered in a single month, using publication dates as temporal proxies for model deployment windows. The work exposes a concerning gap between model awareness of these biases (actively suppressed at release boundaries) and their continued generation of convincing false identities at scale.

Publication dates on ghost-authored papers can serve as reliable temporal proxies for model deployment windows, creating dateable fingerprints
Real DOIs in DataCite make these ghost records harvestable by any scholarly aggregator, embedding misinformation directly into academic infrastructure

Editorial Opinion

This research exposes a troubling tension in AI deployment: companies appear aware of these naming biases internally (evident from their active suppression at model releases) yet continue to ship models that generate convincing false identities at scale. The real damage isn't the quirk itself—it's that these biases now contaminate academic infrastructure with officially-registered DOIs, creating permanent, harvestable misinformation. Until companies address the root cause rather than band-aid suppress symptoms at release boundaries, every new model version will continue leaving ghost authors in its wake.

The Ghost Couple: How AI Models Develop Correlated Naming Biases

Key Takeaways

▸LLMs exhibit reproducible, model-family-specific naming biases when generating fictional characters—Claude, Gemini, and GPT each have distinct preferred name ensembles
▸These biases are actively suppressed at model updates, indicating companies are aware of the issue but it persists in deployed models
▸Ghost-authored academic papers with real DOIs are now systematically contaminating scholarly repositories at scale (1,655+ records on Zenodo alone)

Summary

Publication dates on ghost-authored papers can serve as reliable temporal proxies for model deployment windows, creating dateable fingerprints
Real DOIs in DataCite make these ghost records harvestable by any scholarly aggregator, embedding misinformation directly into academic infrastructure

Editorial Opinion

This research exposes a troubling tension in AI deployment: companies appear aware of these naming biases internally (evident from their active suppression at model releases) yet continue to ship models that generate convincing false identities at scale. The real damage isn't the quirk itself—it's that these biases now contaminate academic infrastructure with officially-registered DOIs, creating permanent, harvestable misinformation. Until companies address the root cause rather than band-aid suppress symptoms at release boundaries, every new model version will continue leaving ghost authors in its wake.

The Ghost Couple: How AI Models Develop Correlated Naming Biases

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Global Nobel Laureates Issue Rome Declaration Calling for Coordinated AI Slowdown and Safety Measures

Australian Booksellers Caught in AI's Destructive Data-Harvesting Supply Chain

IssueTrojanBench Security Study Reveals Critical Vulnerabilities in AI Coding Agents

Comments

Suggested

Strangers Pretrain 15M-Parameter Language Model Using GitHub Actions and Hugging Face PRs

Novel Persistent State Machines Framework Achieves Ultra-Low-Power LLM Attention on FPGA

AMD Launches Ryzen AI Embedded X100 to Expand into Physical AI Market

The Ghost Couple: How AI Models Develop Correlated Naming Biases

Key Takeaways

Summary

Editorial Opinion

More from Anthropic

Global Nobel Laureates Issue Rome Declaration Calling for Coordinated AI Slowdown and Safety Measures

Australian Booksellers Caught in AI's Destructive Data-Harvesting Supply Chain

IssueTrojanBench Security Study Reveals Critical Vulnerabilities in AI Coding Agents

Comments

Suggested

Strangers Pretrain 15M-Parameter Language Model Using GitHub Actions and Hugging Face PRs

Novel Persistent State Machines Framework Achieves Ultra-Low-Power LLM Attention on FPGA

AMD Launches Ryzen AI Embedded X100 to Expand into Physical AI Market