BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
INDUSTRY REPORTGoogle / Alphabet2026-04-29

Massive AI Scraping Campaign Hits Record Scale: 1 in Every 2,000 Public IPs Involved

Key Takeaways

  • ▸A single 24-hour scraping campaign involved approximately 1 in 2,000 public IPv4 addresses globally—2.04 million unique IPs generating over 5 million bot-classified requests
  • ▸The attacks originated from major tech company networks, particularly Microsoft and Google / Alphabet, suggesting direct involvement from companies operating large-scale AI training operations
  • ▸99.77% of traffic came from IPv4 addresses distributed strategically across 202 of 256 IPv4 /8 blocks, indicating deliberate load distribution to evade geographical or network-based blocking
Source:
Hacker Newshttps://vulpinecitrus.info/blog/one-in-every-2000-ipv4-visualizing-ddos-ai-web-scrapers/↗

Summary

A detailed analysis of one of the largest coordinated web scraping attacks on record reveals the staggering scale of data collection operations for AI training. On April 24th, 2026, infrastructure operator HotGarbage logged attacks from 2,040,670 unique IP addresses—representing approximately one in every 2,000 public IPv4 addresses globally—hitting their websites in a single 24-hour period. The sustained assault reached 4,000+ requests per minute, completely overwhelming a modest VPS with a single CPU core. A subsequent wave reached even higher volumes.

Analysis of the source IP addresses identified origins across diverse networks, with particularly prominent sources from Microsoft (AS8075) and Google / Alphabet (AS15169). A single Microsoft IP address (74.7.227.156) generated 150,483 requests alone, while multiple Google IPs showed coordinated attack patterns typical of systematic data collection operations. Across the global IPv4 address space, the attack involved 202 of 256 /8 blocks and generated over 5 million requests classified as bot traffic, demonstrating a deliberate strategy to distribute load across numerous IP addresses and evade traditional IP-based blocking defenses.

This incident provides rare visibility into the infrastructure costs and operational coordination required to scrape web content at the enormous scale demanded by modern AI training pipelines. It raises fundamental questions about the sustainability of defending against such operations, the adequacy of legal protections for content creators, and whether current regulatory frameworks are equipped to address AI-driven data collection at this magnitude.

  • Standard VPS infrastructure (1 CPU core, 200 Mbps bandwidth) proved powerless against the assault, suggesting that defending against AI-driven scraping at this scale is nearly impossible for individual content creators and small-to-medium websites

Editorial Opinion

This incident starkly illustrates the enormous and highly coordinated scale of data collection operations that now underpin modern AI systems. When over 1% of all public IPv4 addresses participate in scraping activity in a single day, it becomes clear that web content harvesting for AI training has reached industrial proportions that dwarf traditional crawling and indexing. The documented involvement of major tech companies raises uncomfortable questions about how aggressive data acquisition practices have become, even as these same companies publicly commit to responsible AI development. Without significant regulatory intervention or industry-wide ethical commitments, the arms race between AI data demands and infrastructure defense will only intensify.

Generative AIData Science & AnalyticsCybersecurityMarket TrendsPrivacy & Data

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
RESEARCH

Google's Gemini-SQL2 Dominates Text-to-SQL Benchmarks with Record 80% Accuracy

2026-06-13
Google / AlphabetGoogle / Alphabet
POLICY & REGULATION

Google Sues Chinese Cybercrime Network That Weaponized Gemini for Mass Phishing Scams

2026-06-12
Google / AlphabetGoogle / Alphabet
RESEARCH

DeepMind Introduces DiffusionGemma: Discrete Diffusion as Alternative to Autoregressive Language Models

2026-06-11

Comments

Suggested

AnthropicAnthropic
POLICY & REGULATION

Anthropic Suspends Claude Fable 5 and Mythos 5 Over US Government Security Order

2026-06-13
AnthropicAnthropic
INDUSTRY REPORT

Europe 2031: The Five-Year Scenario of Europe Squandering the AI Revolution

2026-06-13
NVIDIANVIDIA
INDUSTRY REPORT

The Four Ledgers of AI: Market Only Pricing First Layer of Capex Chain, Says Analysis

2026-06-13
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us