Amnesty International Report Exposes Unlawful Data Scraping and Privacy Violations in Generative AI Training

Key Takeaways

▸Leading AI companies are conducting large-scale, non-consensual data extraction from web sources to train generative AI models, violating privacy rights at scale
▸Training data sourced from the web perpetuates and amplifies real-world biases, causing disproportionate harm to marginalized communities regarding racial, gender, and cultural representation
▸The infrastructure requirements for large generative AI models carry significant environmental costs and resource extraction that disproportionately affects historically marginalized communities

Source:

Hacker Newshttps://www.amnesty.org/en/latest/news/2026/05/global-enormous-data-pipelines-powering-major-generative-ai-systems-are-rooted-in-mass-invasions-of-privacy-by-design/↗

Summary

Amnesty International released a briefing today titled "Unlawful by Design" documenting serious privacy violations in how leading generative AI companies extract and use data to train their models. The report examined data scraping practices used by OpenAI (GPT-3), Google (Gemini), Meta (Llama), DeepSeek, Midjourney, and Stable Diffusion, finding that these companies are extracting billions of personal data points from public web sources without explicit consent from individuals featured in or creating the content.

The report argues that this approach to data collection violates privacy by design and enables "mass invasions of privacy" that make these systems "unlawful by design." Beyond privacy concerns, the extraction and use of web-sourced training data amplifies biases in model outputs, with significant negative consequences for historically marginalized communities, particularly regarding racial, gender, and cultural prejudices.

Amnesty International also highlights the environmental costs of training large generative AI models, which require massive energy and water consumption to power data centers. The organization calls for urgent regulatory action to address what it describes as "egregious practices" and argues that alternative trajectories of technology development are possible if authorities course-correct promptly.

Amnesty International calls for urgent regulatory intervention to enforce privacy-by-design principles and halt unlawful data practices in AI development

Editorial Opinion

This report highlights a critical blind spot in the AI industry: the assumption that because data is publicly available online, it can be extracted and used without consent. Amnesty International's documentation of privacy violations and bias amplification across major AI platforms reveals that current approaches to generative AI development are fundamentally extractive and harm vulnerable communities. Regulatory frameworks must evolve quickly to enforce privacy-by-design requirements and hold companies accountable for the downstream harms of their training data practices.

Amnesty International Report Exposes Unlawful Data Scraping and Privacy Violations in Generative AI Training

Key Takeaways

▸Leading AI companies are conducting large-scale, non-consensual data extraction from web sources to train generative AI models, violating privacy rights at scale
▸Training data sourced from the web perpetuates and amplifies real-world biases, causing disproportionate harm to marginalized communities regarding racial, gender, and cultural representation
▸The infrastructure requirements for large generative AI models carry significant environmental costs and resource extraction that disproportionately affects historically marginalized communities

Summary

Amnesty International calls for urgent regulatory intervention to enforce privacy-by-design principles and halt unlawful data practices in AI development

Editorial Opinion

This report highlights a critical blind spot in the AI industry: the assumption that because data is publicly available online, it can be extracted and used without consent. Amnesty International's documentation of privacy violations and bias amplification across major AI platforms reveals that current approaches to generative AI development are fundamentally extractive and harm vulnerable communities. Regulatory frameworks must evolve quickly to enforce privacy-by-design requirements and hold companies accountable for the downstream harms of their training data practices.

Amnesty International Report Exposes Unlawful Data Scraping and Privacy Violations in Generative AI Training

Key Takeaways

Summary

Editorial Opinion

More from DeepSeek

DeepSeek Designs Proprietary Inference Chip to Reduce Nvidia Dependence

DeepSeek Introduces DSpark: Speculative Drafting for More Efficient LLM Inference

Verification Loops Give DeepSeek 4x Boost, Matching Opus at 1/7 the Cost

Comments

Suggested

Anthropic Sues Abnormal AI Over Trademark Infringement

Dari AI Launches Privacy-First macOS Assistant With On-Device Model and Offline-First Design

The 'Not X, But Y' Trap: Why AI Writing Sounds So Formulaic

Amnesty International Report Exposes Unlawful Data Scraping and Privacy Violations in Generative AI Training

Key Takeaways

Summary

Editorial Opinion

More from DeepSeek

DeepSeek Designs Proprietary Inference Chip to Reduce Nvidia Dependence

DeepSeek Introduces DSpark: Speculative Drafting for More Efficient LLM Inference

Verification Loops Give DeepSeek 4x Boost, Matching Opus at 1/7 the Cost

Comments

Suggested

Anthropic Sues Abnormal AI Over Trademark Infringement

Dari AI Launches Privacy-First macOS Assistant With On-Device Model and Offline-First Design

The 'Not X, But Y' Trap: Why AI Writing Sounds So Formulaic