Sarah Silverman and authors Richard Kadrey and Christopher Golden have filed copyright infringement lawsuits against LLaMa and ChatGPT by Meta Platforms and OpenAI, respectively.
Meta and OpenAI allegedly used the plaintiffs’ content without permission to train their respective artificial intelligence (AI) systems.
According to the court documents filed against Meta, many of the plaintiffs’ copyright-protected books are included in the dataset “Meta has admitted to using to train LLaMA.”
Similarly, in the case against OpenAI, it is alleged that when ChatGPT generates summaries of the plaintiffs’ work, it indicates that they were trained using copyrighted content.
“The summaries get some details wrong. This is expected since a large language model mixes together expressive material derived from many sources. Still, the rest of the summaries are accurate…”
The lawsuits allege that the companies retrieved the copyrighted data from so-called “shadow libraries,” such as Bibliotik, Library Genesis, Z-Library, and others, to obtain this information.
According to the lawsuit, these shadow libraries are websites that utilize torrent systems to make books “available in bulk.” Such websites are illegal and distinct from open-source data from databases such as Gutenberg, which accumulates books with expired copyrights.
“These shadow libraries have long been of interest to the AI-training community because of the large quantity of copyrighted material they host.”
In addition to their claims of copyright infringement, the authors filed the complaint on behalf of a class of copyright owners across the United States whose works were also allegedly infringed.
In May, members of the Writers Guild of America across the United States participated in a legal strike – the first in 15 years – that highlighted many issues facing the industry, including the use of artificial intelligence.