AI: Help or piracy? Claims against OpenAI and Meta

At the beginning of 2023, ChatGPT became the fastest growing consumer app in history, reaching 100 million monthly active users in January before being supplanted by Meta’s Threads app.

But as practice shows, as AI grows, so does the number of claims from the creative community against its “large language model” – LLaMA (Large Language Model Meta AI) is AI software designed to create convincingly organic and literate text in response to user queries.

“Instead of being programmed in the traditional way (by engineers creating thousands of pages of code), a large language model is ‘trained’ by copying huge amounts of text and extracting expressive information from it. The text is called a training dataset,” explains the claim filed in the U.S. District Court for the Northern District of California, where Facebook’s parent company is based.

Sarah Silverman, Christopher Golden, and Richard Kadri vs. OpenAI and Meta

Meta released LLaMA in February 2023. And in the summer of 2023, comedian Sarah Silverman, as well as authors Christopher Golden and Richard Kadri, who joined Sarah, filed claims against Meta and OpenAI (an American research organization engaged in artificial intelligence development) in the US District Court on double copyright infringement charges.

The claims allege, among other things, that OpenAI’s ChatGPT and Meta’s LLaMA were trained on illegally obtained datasets containing the works of these authors, which they claim were purchased on “shadow libraries” such as Bibliotik, Library Genesis, Z-Library, and others, noting that these books are “available in bulk through torrent systems”.

In the claim against OpenAI, a trio of authors (Silverman, Golden, and Kadri) offer evidence that ChatGPT summarizes their books upon request, violating their copyrights. The first book to be summarized by ChatGPT was Silverman’s Bedwater, with Golden’s Ararat and Kadri’s Slim the Sandman also being used as examples. Theclaimstates that the chatbot has never attempted to “reproduce any of the copyright owner information that plaintiffs have included in their published works”.

As for the separate claim against Meta, it alleges that the authors’ books were available in the datasets that Meta used to train its LLaMA models, a quartet of open-source AI models that the company introduced in February 2023.

The claim lays out in stages why the plaintiffs believe the datasets are of illegal origin – in Meta’s document describing LLaMA, the company points to the sources of its training datasets, one of which is called ThePile, collected by a company called EleutherAI. The claimstates that ThePile was described in EleutherAI’s document as being compiled from “a copy of the contents of the private tracker Bibliotik”. Bibliotik and the other listed “shadow libraries,” the claim says, are “blatantly illegal.”

In both claims, the authors claim that they “did not consent to the use of their copyrighted books as training material” for the companies’ AI models. Their claim s contain six counts each of various types of copyright infringement, negligence, unjust enrichment, and unfair competition. Among other claims, the main ones are compensation for damages and return of profits.

At a hearing in the Sarah Silverman case in November 2023, a federal judge said he would dismiss part of a claim filed by a group of authors, including comedian Sarah Silverman, who claim that Meta’s LLaMA AI app infringes on their copyrights.

Judge Vince Chhabria stated that the authors’ claim that the text generated by LLaMA infringes their copyrights simply does not hold water. “When I make a request to LLaMA, I don’t ask for a copy of Sarah Silverman’s book – I don’t even ask for an excerpt,” Chhabria said, noting that, according to the authors’ theory, a comparison of the text generated by the AI application and Silverman’s book should have shown their similarity.

However, the judge said that he would not dismiss the case with prejudice, meaning that the authors would be allowed to amend and resubmit their claims. Moreover, the main claim of the claim – that Meta’s use of unauthorized copies to train its AI model is an infringement – remains.

A group of authors against OpenAI

Following Sarah Silverman, Christopher Golden, and Richard Kadri, another group of authors from the United States, including Pulitzer Prize-winning author Michael Chabon, have filed a claim against OpenAI in federal court in San Francisco, accusing the Microsoft-backed program of misusing their works to train the popular AI-powered chatbot ChatGPT.

Chabon, playwright David Henry Hwang, and authors Matthew Klum, Rachel Louise Snyder, and Ayelet Waldman claimed in their September 2023 claim that OpenAI copied their works without permission to train ChatGPT to respond to human text prompts.

This claim is at least the third proposed class action copyright infringement claim filed by authors against Microsoft-backed OpenAI.

Companies including Microsoft, Meta Platforms, and Stability AI have also been sued by copyright holders over the use of their work in AI training.

The new claim out of San Francisco states that works such as books, plays, and articles are particularly valuable for learning at ChatGPT as “the best examples of high-quality, extended writing.”

The plaintiffs own the copyrights to their books and written works, and have never consented to their use as teaching materials for LLaMA.

The works of Chabon (The Wonder Boys, The Amazing Adventures of Cavalier and Clay, The Yiddish Police Union), Hwang (M. Butterfly, Chinleesh, Yellow Face, Golden Child) and the works of other plaintiffs “contain copyright management information that provides information about the copyrighted work, including the title of the work, its ISBN or copyright registration number, the author’s name, and the year of publication,” the claim states.

Claims against AI are only gaining momentum

All of these claims make similar claims: AI can “generate text in the style of a particular author” or “provide in-depth analysis” of authors’ books simply because the books were “copied” without the authors’ permission as part of their “training data,” as stated in the Open AI complaint. This data includes copies allegedly collected from notorious piracy sites.

The decision to dismiss some of the above-mentioned claims of the authors was made after another federal judge dismissed similar claims in a claim filed by a group of artists against Stability AI, Midjourney, and Deviant Art (read our article “Artificial Intelligence: What About Rights and Protection?” – Judge William Orrick stated that he was “not sure that allegations based on the performance of the systems can survive without proof that the images were substantially similar to the artists’ works.”

Claims like this are not just a headache for OpenAI, Meta, and other AI companies; they call into question the very boundaries of copyright. Whenever someone raises the topic of copyright, we will see claims centered around this issue for many years to come, until not only international law but also local laws in every country in the world regulate the relevant issues of using intellectual property results by AI.

/ / / / /

X