BotBeat
...
← Back

> ▌

DeepSeekDeepSeek
INDUSTRY REPORTDeepSeek2026-05-28

Amnesty International Report Exposes Unlawful Data Scraping and Privacy Violations in Generative AI Training

Key Takeaways

  • ▸Leading AI companies are conducting large-scale, non-consensual data extraction from web sources to train generative AI models, violating privacy rights at scale
  • ▸Training data sourced from the web perpetuates and amplifies real-world biases, causing disproportionate harm to marginalized communities regarding racial, gender, and cultural representation
  • ▸The infrastructure requirements for large generative AI models carry significant environmental costs and resource extraction that disproportionately affects historically marginalized communities
Source:
Hacker Newshttps://www.amnesty.org/en/latest/news/2026/05/global-enormous-data-pipelines-powering-major-generative-ai-systems-are-rooted-in-mass-invasions-of-privacy-by-design/↗

Summary

Amnesty International released a briefing today titled "Unlawful by Design" documenting serious privacy violations in how leading generative AI companies extract and use data to train their models. The report examined data scraping practices used by OpenAI (GPT-3), Google (Gemini), Meta (Llama), DeepSeek, Midjourney, and Stable Diffusion, finding that these companies are extracting billions of personal data points from public web sources without explicit consent from individuals featured in or creating the content.

The report argues that this approach to data collection violates privacy by design and enables "mass invasions of privacy" that make these systems "unlawful by design." Beyond privacy concerns, the extraction and use of web-sourced training data amplifies biases in model outputs, with significant negative consequences for historically marginalized communities, particularly regarding racial, gender, and cultural prejudices.

Amnesty International also highlights the environmental costs of training large generative AI models, which require massive energy and water consumption to power data centers. The organization calls for urgent regulatory action to address what it describes as "egregious practices" and argues that alternative trajectories of technology development are possible if authorities course-correct promptly.

  • Amnesty International calls for urgent regulatory intervention to enforce privacy-by-design principles and halt unlawful data practices in AI development

Editorial Opinion

This report highlights a critical blind spot in the AI industry: the assumption that because data is publicly available online, it can be extracted and used without consent. Amnesty International's documentation of privacy violations and bias amplification across major AI platforms reveals that current approaches to generative AI development are fundamentally extractive and harm vulnerable communities. Regulatory frameworks must evolve quickly to enforce privacy-by-design requirements and hold companies accountable for the downstream harms of their training data practices.

Generative AIRegulation & PolicyEthics & BiasAI Safety & AlignmentPrivacy & Data

More from DeepSeek

DeepSeekDeepSeek
UPDATE

DeepSeek Dramatically Cuts API Prices by 75% While Competitors Raise Rates

2026-05-27
DeepSeekDeepSeek
UPDATE

DeepSeek Makes 75% Discount on V4-Pro Permanent, Intensifying AI Price War

2026-05-24
DeepSeekDeepSeek
RESEARCH

DeepSeek V4 Pro and Flash Positioned Between Kimi and Claude in Independent Benchmark Test

2026-05-15

Comments

Suggested

Academic ResearchAcademic Research
RESEARCH

New Research Reveals 'Omissive Bias' in LLMs' Handling of Religious Perspectives in Ethical Guidance

2026-05-28
OutcryOutcry
PRODUCT LAUNCH

Occupy Wall Street Co-Founder Launches Outcry, Privacy-First AI for Activists

2026-05-28
ChiasmusChiasmus
PRODUCT LAUNCH

Chiasmus: Formal Reasoning Engine Brings Symbolic AI to LLM Code Analysis

2026-05-28
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us