How can they prove that not some abstract public data has been used to train algorithms, but their particular intellectual property?
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
Well, if you ask e.g. ChatGPT for the lyrics to a song or page after page of a book, and it spits them out 1:1 correct, you could assume that it must have had access to the original.
Or at least excerpts from it. But even then, it's one thing for a person to put up a quote from their favourite book on their blog, and a completely different thing for a private company to use that data to train a model, and then sell it.
Even more so, if you consider that the LLMs are marketed to replace the authors.
you could assume that it must have had access to the original.
I don't know if that's true. If Google grabs that book from a pirate site. Then publishes the work as search results. ChatGPT grabs the work from Google results and cobbles it back together as the original.
Who's at fault?
I don't think it's a straight forward ChatGPT can reproduce the work therefore it stole it.
Both are at fault: Google for distributing pirated material and OpenAI for using said material for financial gain.
there are a lot of possible ways to audit an AI for copyrighted works, several of which have been proposed in the comments here, but what this could lead to is laws requiring an accounting log of all material that has been used to train an AI as well as all copyrights and compensation, etc.
Not without some seriously invasive warrants! Ones that will never be granted for an intellectual property case.
Intellectual property is an outdated concept. It used to exist so wealthier outfits couldn't copy your work at scale and muscle you out of an industry you were championing.
It simply does not work the way it was intended. As technology spreads, the barrier for entry into most industries wherein intellectual property is important has been all but demolished.
i.e. 50 years ago: your song that your band performed is great. I have a recording studio and am gonna steal it muahahaha.
Today: "anyone have an audio interface I can borrow so my band can record, mix, master, and release this track?"
Intellectual property ignores the fact that, idk, Issac Newton and Gottfried Wilhelm Leibniz both independently invented calculus at the same time on opposite ends of a disconnected globe. That is to say, intellectual property doesn't exist.
Ever opened a post to make a witty comment to find someone else already made the same witty comment? Yeah. It's like that.
Spoken by someone who has never had something you've worked years on, be stolen.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.
When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.
Folks, this isn’t a new problem, and it doesn’t need new laws.
It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.
For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.
Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc
I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:
"He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"
It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?
When you sell a book, you don’t get to control how that book is used.
This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.
Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.
This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.
No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.
My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.
Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.
My objection is strictly on the input side, and the output is already restricted.
This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?
I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.
There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.
That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.
You know what would solve this? We all collectively agree this fucking tech is too important to be in the hands of a few billionaires, start an actual public free open source fully funded and supported version of it, and use it to fairly compensate every human being on Earth according to what they contribute, in general?
Why the fuck are we still allowing a handful of people to control things like this??
Someone should AGPL their novel and force the AI company to open source their entire neural network.
So what's the difference between a person reading their books and using the information within to write something and an ai doing it?
Because AIs aren't inspired by anything and they don't learn anything
So uninspired writing is illegal?
No but a lazy copy of someone else’s work might be copyright infringement.
I don't know how I feel about this honestly. AI took a look at the book and added the statistics of all of its words into its giant statistic database. It doesn't have a copy of the book. It's not capable of rewriting the book word for word.
This is basically what humans do. A person reads 10 books on a subject, studies become somewhat of a subject matter expert and writes their own book.
Artists use reference art all the time. As long as they don't get too close to the original reference nobody calls any flags.
These people are scared for their viability in their user space and they should be, but I don't think trying to put this genie back in the bottle or extra charging people for reading their stuff for reference is going to make much difference.
It’s not at all like what humans do. It has no understanding of any concepts whatsoever, it learns nothing. It doesn’t know that it doesn’t know anything even. It’s literally incapable of basic reasoning. It’s essentially taken words and converted them to numbers, and then it examines which string is likely to follow each previous string. When people are writing, they aren’t looking at a huge database of information and determining the most likely word to come next, they’re synthesizing concepts together to create new ones, or building a narrative based on their notes. They understand concepts, they understand definitions. An AI doesn’t, it doesn’t have any conceptual framework, it doesn’t even know what a word is, much less the definition of any of them.
How can you tell that our thoughts don't come from a biological LLM? Maybe what we conceive as "understanding" is just a feeling emerging from a more fondamental mechanism like temperature emerges from the movement of particles.
I think this is more about frustration experienced by artists in our society at being given so little compensation.
The answer is staring us in the face. UBI goes hand in hand with developments in AI. Give artists a basic salary from the government so they can afford to live well. This isn't a AI problem this is a broken society problem. I support artists advocating for themselves, but the fact that they aren't asking for UBI really speaks to how hopeless our society feels right now.
This is tough. I believe there is a lot of unfair wealth concentration in our society, especially in the tech companies. On the other hand, I don't want AI to be stifled by bad laws.
If we try to stop AI, it will only take it away from the public. The military will still secretly use it, companies might still secretly use it. Other countries will use it and their populations will benefit while we languish.
Our only hope for a happy ending is to let this technology be free and let it go into the hands of many companies and many individuals (there are already decent models you can run on your own computer).
So, in your "only hope for a happy ending" scenario, how do the artists get paid? Or will we no longer need them after AI runs everything ;)
This is so stupid. If I read a book and get inspired by it and write my own stuff, as long as I'm not using the copyrighted characters, I don't need to pay anyone anything other than purchasing the book which inspired me originally.
If this were a law, why shouldn't pretty much each modern day fantasy author not pay Tolkien foundation or any non fiction pay each citation.
There's a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
All AI creations are derivative and subject to copyright law.
There’s a difference between a sapient creature drawing inspiration and a glorified autocomplete using copyrighted text to produce sentences which are only cogent due to substantial reliance upon those copyrighted texts.
But the AI is looking at thousands, if not millions of books, articles, comments, etc. That's what humans do as well - they draw inspiration from a variety of sources. So is sentience the distinguishing criteria for copyright? Only a being capable of original thought can create original work, and therefore anything not capable of original thought cannot create copyrighted work?
Also, irrelevant here but calling LLMs a glorified autocomplete is like calling jet engines a "glorified horse". Technically true but you're trivialising it.
Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.
Isn’t learning the basic act of reading text?
not even close. that's not how AI training models work, either.
if your position is that only humans can learn and adapt text
nope-- their demands are right at the top of the article and in the summary for this post:
Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools
that broadly rules out any AI ever
only if the companies training AI refuse to pay
While I am rooting for authors to make sure they get what they deserve, I feel like there is a bit of a parallel to textbooks here. As an engineer if I learn about statics from a text book and then go use that knowledge to he'll design a bridge that I and my company profit from, the textbook company can't sue. If my textbook has a detailed example for how to build a new bridge across the Tacoma Narrows, and I use all of the same design parameters for a real Tacoma Narrows bridge, that may have much more of a case.