The Great AI Content Heist: How Bots Are Devouring the Internet – and How We Can Fight Back
The rise of generative AI has kicked off an arms race for data, as AI companies seek to ingest as much online content as possible to train their models. Text from websites, images, code repositories, music – and now video – are all being vacuumed up. A bombshell report from The Atlantic in September 2025 revealed the sheer scale of this activity on YouTube: more than 15.8 million videos were quietly scraped and downloaded without permission as training data for AI theatlantic.com. These weren’t obscure clips either – nearly 1 million were how-to videos, and countless others came from popular creators and even major organizations like the BBC and TED theatlantic.com theatlantic.com. In many cases the videos were stripped of titles or creator names in the datasets to obscure their origin theatlantic.com, but investigators traced the data back to real YouTube channels. Crucially, this mass downloading violates YouTube’s terms of service – yet it has been happening largely unchecked theatlantic.com. AI developers have used third-party tools to rip videos en masse. YouTube appears to have done little, if anything, to stop the mass downloading, according to The Atlantic, and the company declined to comment on the situation theatlantic.com. In other