BotBeat
...
← Back

> ▌

GitHubGitHub
POLICY & REGULATIONGitHub2026-03-26

GitHub Reverses Course, Will Train AI Models on User Data by Default Starting April 24

Key Takeaways

  • ▸GitHub will begin training AI models on user data (code snippets, inputs, outputs, and context) starting April 24, 2024, by default for free and paid Copilot tiers
  • ▸The policy uses an opt-out model, allowing users to disable data training in privacy settings, though this differs from stricter opt-in requirements in Europe
  • ▸Private repositories are no longer fully private when users have Copilot enabled, as code snippets can be collected for AI training purposes
Source:
Hacker Newshttps://www.theregister.com/2026/03/26/github_ai_training_policy_changes/↗

Summary

Microsoft's GitHub announced it will begin using customer interaction data to train its AI models starting April 24, 2024, marking a significant policy reversal. The change applies to Copilot Free, Pro, and Pro+ users, while Copilot Business and Enterprise customers remain exempt. The data collection includes code snippets, inputs, outputs, file names, comments, and user interactions with Copilot features from both public and private repositories.

Users can opt out by visiting their privacy settings, following an opt-out model rather than the opt-in requirements typical in Europe. GitHub's Chief Product Officer Mario Rodriguez argued the data collection will improve model accuracy and code suggestions. However, the policy shift has generated significant community backlash, with users expressing skepticism about the rebranding of "private" repositories and concerns about consent, despite the fact that GitHub Copilot's underlying Codex model was already trained on publicly available GitHub code.

  • The move has generated substantial community pushback, with GitHub users expressing concerns about consent and data privacy despite GitHub's claim that similar practices are industry standard

Editorial Opinion

GitHub's reversal on AI training data represents a troubling normalization of data extraction in the AI industry. While the company frames this as an opt-out choice, the fundamental issue remains: developers who depend on Copilot must actively resist to protect their code, rather than being asked for explicit consent. The fact that GitHub's own Copilot was built on previously scraped GitHub code illustrates how the AI industry has systematically extracted value from developers without meaningful consent—a precedent that makes this new policy feel inevitable rather than justified.

Generative AIEthics & BiasPrivacy & Data

More from GitHub

GitHubGitHub
INDUSTRY REPORT

AI-Generated Abandonware Is Hollowing Out Open Source, Industry Analysis Shows

2026-05-20
GitHubGitHub
UPDATE

GitHub Copilot Remote Control Now Generally Available for CLI and VS Code

2026-05-18
GitHubGitHub
INDUSTRY REPORT

GitHub's Infrastructure Crumbles Under AI Coding Tsunami: 206% Growth in AI-Generated Projects Breaks Distributed Version Control

2026-05-15

Comments

Suggested

Generative AIGenerative AI
INDUSTRY REPORT

Barnes & Noble CEO Backs Selling AI-Written Books, Sparking Industry Debate on Transparency Standards

2026-05-20
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google DeepMind Launches Gemini 3.5 Flash: New Lightweight AI Model

2026-05-20
Executive Office of the President of the United States (Policy/Regulation)Executive Office of the President of the United States (Policy/Regulation)
RESEARCH

SID Achieves Search Breakthrough with SID-1, Outperforming GPT-5 at 1k+ QPS Using Reinforcement Learning

2026-05-20
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us