Report: Generative AI violates copyrights of news publishers
News publishers are concerned AI answers will replace Search traffic, leading to less revenue, lost jobs and other damages.
AI companies (e.g., Google, OpenAI) are primarily using high-quality content created by news publishers to train generative AI systems, which then compete directly against those publishers.
That’s the core argument made in a new report from News Media Alliance (NMA), a trade association for 2,200 publishers in the U.S. and Canada.
Why we care. Since the arrival of Bing Chat, Google Bard and Google’s Search Generative Experience, publishers of all sizes have been concerned about generative AI replacing search, which could lead to a devastating impact on organic traffic, revenue and even the brand’s image (e.g., through hallucinations, such as Bing Chat discussing the New York Times endorsing Donald Trump as the 2024 Republican nominee for president).
Dig deeper: The new frontier of visual content: A marketer’s guide to AI
What is happening. The NMA compared public datasets used to train popular large language models (LLMs), which power AI chatbots like ChatGPT, to an open-source dataset of generic web content. According to the report, the curated data sets used news content up to 100 times more than the generic data set. It also found the LLMs copying the exact language in news articles. This supports the NMA’s long-held position that Google and other tech companies are using news organizations’ work without paying for it.
“It genuinely acts as a substitution for our very work,” Danielle Coffey, News Media Alliance president and CEO, told the New York Times. “You can see our articles are just taken and regurgitated verbatim.”
Response to the report. Google and OpenAI have declined to comment on the report. But we know Google believes all online content should be available for AI training unless publishers opt out.
Some control for news publishers. The AI companies will continue to have ways to access content for training purposes, (e.g., through licensing or crawling) unless you’ve blocked bots like GoogleBot or CCbot (Common Crawl) entirely.
What’s next. The report (PDF) has been submitted along with commentary (PDF) for the U.S. Copyright Office’s Artificial Intelligence Study. In the meantime, the NMA is hoping to work out a licensing agreement with the relevant tech companies on behalf of its members.
Related stories