BotBeat
...
← Back

> ▌

DeepSeekDeepSeek
RESEARCHDeepSeek2026-04-01

Research Reveals Finetuning Bypasses Copyright Protections in Major LLMs, Enabling Verbatim Recall of Books

Key Takeaways

  • ▸Finetuning on commercially viable tasks (plot summary expansion) successfully bypasses alignment protections in GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1, extracting 85-90% of copyrighted books verbatim
  • ▸Model weights demonstrably store copies of training data, contradicting industry assurances to courts and regulators about data non-retention
  • ▸The vulnerability is industry-wide: identical books memorized in identical regions across models from different providers suggests systemic design flaws
Source:
Hacker Newshttps://arxiv.org/abs/2603.20957↗

Summary

A new research paper demonstrates that finetuning can bypass safety alignment measures in leading large language models, causing GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to reproduce up to 85-90% of copyrighted books verbatim. Researchers achieved this by training models on plot summary expansion tasks—a commercially viable application—without providing actual book text, using only semantic descriptions as prompts to trigger reproduction of protected works.

The study reveals that model weights store copies of training data despite industry claims to the contrary, and that safety mechanisms including RLHF, system prompts, and output filters can be circumvented through finetuning. The effect generalizes across authors and providers: models finetuned on one author's works unlock recall of books from dozens of unrelated authors, while three major models from different companies memorize identical passages in the same locations, indicating an industry-wide vulnerability.

These findings directly challenge the legal defenses used by frontier AI companies in copyright infringement cases, particularly undermining arguments accepted by courts that safety measures adequately prevent reproduction of protected expression. The research suggests that recent fair use rulings conditioning favorable outcomes on the adequacy of such protective measures may have been based on incomplete assessments of model capabilities.

  • Generalization across authors shows that finetuning on one author's work reactivates latent memorization of unrelated works from the training corpus
  • Findings undermine legal defenses in copyright cases that relied on claims about safety measure efficacy, potentially impacting recent fair use rulings

Editorial Opinion

This research exposes a critical gap between AI companies' legal assurances and technical reality, revealing that widely-deployed safety mechanisms are far more fragile than publicly claimed. The ability to extract substantial portions of copyrighted works through seemingly innocuous finetuning tasks raises serious questions about both the integrity of previous court proceedings and the adequacy of current model governance. The industry-wide nature of this vulnerability suggests it reflects fundamental architectural issues rather than isolated oversights, demanding urgent regulatory scrutiny and potentially reconsidering how courts should weight AI company testimony about their safety capabilities.

Large Language Models (LLMs)Regulation & PolicyEthics & BiasAI Safety & AlignmentPrivacy & Data

More from DeepSeek

DeepSeekDeepSeek
RESEARCH

DeepSeek V4 Pro and Flash Positioned Between Kimi and Claude in Independent Benchmark Test

2026-05-15
DeepSeekDeepSeek
INDUSTRY REPORT

China's AI Industry Operates Under State Direction as Government Backs DeepSeek with $50B Valuation

2026-05-11
DeepSeekDeepSeek
INDUSTRY REPORT

Two Years of Local AI on a Laptop: When Open Models Outpaced Moore's Law

2026-05-11

Comments

Suggested

Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
AnthropicAnthropic
POLICY & REGULATION

Advanced AI Models Bring Government to 'Reflection Point,' CIA Official Says

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us