this post was submitted on 10 Jul 2023
111 points (92.4% liked)

Piracy

22365 readers
1 users here now

Welcome to /c/piracy

No netflix or streaming services landlubbers allowed, this is pirates territory.

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] dingus@lemmy.ml 10 points 1 year ago

More detailed coverage from The Verge: https://www.theverge.com/2023/7/9/23788741/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai

The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

I used to have a Bibliotik account, and if this is true about ThePile, they very likely have at least the beginnings of a successful case.