The Ghost Couple: How AI Models Develop Correlated Naming Biases
Key Takeaways
- ▸LLMs exhibit reproducible, model-family-specific naming biases when generating fictional characters—Claude, Gemini, and GPT each have distinct preferred name ensembles
- ▸These biases are actively suppressed at model updates, indicating companies are aware of the issue but it persists in deployed models
- ▸Ghost-authored academic papers with real DOIs are now systematically contaminating scholarly repositories at scale (1,655+ records on Zenodo alone)
Summary
A new research paper reveals that large language models don't generate names randomly—they exhibit model-family-specific preferences for certain fictional identities. Claude consistently generates Elena Vasquez, Marcus Chen, and Amara Okafor (the "Ghost Couple" and their partner) as academic collaborators across independent documents; Gemini favors Aris Thorne and Lena Petrova; GPT defaults to Elara Voss. These naming biases are version-specific and leave dateable behavioral fingerprints that can identify which model generated a piece of content.
The research has documented a serious downstream consequence: 1,655 ghost-authored papers now exist on Zenodo (CERN's repository) with fabricated journal names and backdated publication dates. Critically, these records carry real DOIs registered in DataCite, making them harvestable by scholarly aggregators and contaminating the academic literature. The researchers traced 991 records registered in a single month, using publication dates as temporal proxies for model deployment windows. The work exposes a concerning gap between model awareness of these biases (actively suppressed at release boundaries) and their continued generation of convincing false identities at scale.
- Publication dates on ghost-authored papers can serve as reliable temporal proxies for model deployment windows, creating dateable fingerprints
- Real DOIs in DataCite make these ghost records harvestable by any scholarly aggregator, embedding misinformation directly into academic infrastructure
Editorial Opinion
This research exposes a troubling tension in AI deployment: companies appear aware of these naming biases internally (evident from their active suppression at model releases) yet continue to ship models that generate convincing false identities at scale. The real damage isn't the quirk itself—it's that these biases now contaminate academic infrastructure with officially-registered DOIs, creating permanent, harvestable misinformation. Until companies address the root cause rather than band-aid suppress symptoms at release boundaries, every new model version will continue leaving ghost authors in its wake.



