The Great AI Content Heist: How Bots Are Devouring the Internet – and How We Can Fight Back
AI Companies Are Scraping the Web for Everything – Without Asking The rise of generative AI has kicked off an arms race for data, as AI companies seek to ingest as much online content as possible to train their models. Text from websites, images, code repositories, music – and now video – are all being vacuumed up. A bombshell report from The Atlantic in September 2025 revealed the sheer scale of this activity on YouTube: more than 15.8 million videos (from over 2 million channels) were quietly scraped and downloaded without permission as training data for AI theatlantic.com. These weren’t obscure clips either – nearly 1 million were