Encyclopedia Britannica and Merriam-Webster Sue OpenAI for 'Massive Copyright Infringement'
Key Takeaways
- ▸Britannica alleges OpenAI trained ChatGPT on nearly 100,000 copyrighted articles without permission and continues to use the content in RAG systems
- ▸The lawsuit adds to mounting legal pressure on OpenAI from publishers and writers over copyright infringement and content monetization
- ▸Legal precedent remains unclear: while LLM training may qualify as transformative use, companies still face liability for how they acquire training data
Summary
Encyclopedia Britannica and Merriam-Webster have filed a lawsuit against OpenAI, alleging the AI company scraped nearly 100,000 copyrighted articles without permission to train its large language models. The lawsuit claims OpenAI violates copyright law both during model training and when ChatGPT generates verbatim reproductions of Britannica content or uses it in retrieval-augmented generation (RAG) workflows. Additionally, Britannica alleges OpenAI violates the Lanham Act by generating false information and attributing it to the publisher, and argues that ChatGPT's responses directly compete with and undermine the revenue of quality content publishers.
The lawsuit represents part of a broader legal campaign against OpenAI over copyright issues, joining actions by The New York Times, Ziff Davis, and numerous newspapers across the U.S. and Canada. While there is no strong legal precedent establishing whether using copyrighted content to train LLMs constitutes infringement, recent cases have shown mixed results—Anthropic convinced a federal judge that using content for training is transformative, though the company still faced a $1.5 billion settlement for illegally downloading books rather than licensing them.
- Britannica claims ChatGPT's verbatim reproductions and hallucinations harm both publisher revenue and the public's access to trustworthy information
Editorial Opinion
This lawsuit highlights a critical tension in the AI era: while training on broad internet content enables powerful LLMs, the current legal and business framework fails to fairly compensate creators whose work fueled that training. The copyright battle against OpenAI reflects a deeper question about whether transformative AI use should come at the cost of content creators' livelihoods, and whether hallucinations that falsely cite sources constitute a distinct harm requiring new legal remedies.



