this post was submitted on 26 Jan 2024
430 points (83.1% liked)
Technology
59572 readers
3098 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don't want or can't control?
Data is not copyrighted, only the image is. Furthermore you can not copyright a number, even though you could use a sufficiently large number to completely represent a specific image. There's also the fact that copyright does not protect possession of works, only distribution of them. If I obtained a copyrighted work no matter the means chosen to do so, I've committed no crime so long as I don't duplicate that work. This gets into a legal grey area around computers and the fundamental way they work, but it was already kind of fuzzy if you really think about it anyway. Does viewing a copyrighted image violate copyright? The visual data of that image has been copied into your brain. You have the memory of that image. If you have the talent you could even reproduce that copyrighted work so clearly a copy of it exists in your brain.
Yeah. And the hard drives and networks that pass Midjourney's network weights around?
That's distribution. Did Midjourney obtain a license from the artists to allow large numbers of "Joker" copyrighted data to be copied on a ton of servers in their data-center so that Midjourney can run? They're clearly letting the public use this data.
Because they're not copying around images of Joker, they're copying around a work derived from many many things including images of Joker. Copying a derived work does not violate the copyright of the work it was derived from. The wrinkle in this case is that you can extract something very similar to the original works back out of the derived work after the fact. It would be like if you could bake a cake, pass it around, and then down the line pull a whole egg back out of it. Maybe not the exact egg you started with, but one very similar to it. This is a situation completely unlike anything that's come before it which is why it's not actually covered by copyright. New laws will need to be drafted (or at a bare minimum legal judgements made) to decide how exactly this situation should be handled.
Someone already downvoted you but this is exactly the topic of debate surrounding this issue.
Other recognized fair-use exemptions have similar interpretations: a computer model analyzes a large corpus of copyrighted work for the purposes of being able to search their contents and retrieve relevant snippets and works based on semantic and abstract similarities. The computer model that is the representation of those works for that purpose is fair use: it contains only factual information about those works. It doesn't matter if the works used for that model were unlicensed: the model is considered fair use.
AI models operate by a very similar method, albeit one with a lot more complexity. But the model doesn't contain copyrighted works, it is only itself a collection of factual information about the copyrighted works. The novel part of this case is that it can be used to re-construct expressions very similar to the original (it should be pointed out that the fidelity is often very low, and the more detailed the output the less like the original it becomes). It isn't settled yet if that fact changes this interpretation, but regardless I think copyright is already not the right avenue to pursue, if the goal is to remediate or prevent harm to creators and encourage novel expressions.
Right, you're basically making the same points as me, although technically the model itself is a copyrighted work. Part of the problem we're running into these days is that copyright, patent, trademark, and trade secret, all date from a time when the difference between those things was fairly intuitive. With our modern digital world with things like 3D printers and the ease with which you can rapidly change the formats and encodings of arbitrary pieces of data the lines all start to blur together.
If you have a 3D scan of a statue of pikachu what rights are involved there? What if you print it? What if you use that model to generate a PNG? What if you print that PNG? What if you encode the model file using base64 and embed it in the middle of a gif of Rick Astley?
Corporations have already utterly fucked all our IP laws, it might be time to go back to the drawing board and reevaluate the whole thing, because what we have now often feels like it has more cracks than actual substance.
Yea, sorry if it wasn't clear, but I was agreeing with you (defending against the downvote).
There are a lot of things at play here, even if there seems to be a clear way to interpret copyright law (that's untested, but still) that would determine the models being a fair use. I think people are rightfully angry/frustrated with the size of these companies building the models, and the risk posed by private ownership over them. If I were inclined to be idealistic, I would say that the models should be in the public domain and the taxes should be used so as to provide a UBI to counter any job loss/efficiencies provided by the automation, but that's a tall order.
https://www.law.cornell.edu/wex/derivative_work
Are you just making shit up?
The answer to that question is extensively documented by thousands of research papers - it's not up for debate.
If someone wants to read one of those papers, I can recommend Extracting Training Data from Diffusion Models. It shouldn't be too hard for someone with little experience in the field to be able to follow along.
There response well be we don't know we can't understand what its doing.
What the fuck is this kind of response? Its just a fucking neural network running on GPUs with convolutional kernels. For fucks sake, turn on your damn brain.
Generative AI is actually one of the easier subjects to comprehend here. Its just calculus. Use of derivatives to backpropagate weights in such a way that minimizes error. Lather-rinse-repeat for a billion iterations on a mass of GPUs (ie: 20 TFlop compute systems) for several weeks.
Come on, this stuff is well understood by Comp. Sci by now. Not only 20 years ago when I learned about this stuff, but today now that AI is all hype, more and more people are understanding the basics.
Understanding the math behind it doesn't immediately mean understanding the decision progress during forward propagation. Of course you can mathematically follow it, but you're quickly gonna lose the overview with that many weights. There's a reason XAI is an entire subfield in Machine Learning.
Ummm... its lossy compressed data from the training set.
Is it a perfect copy? No. But copyright law covers "derivative data" so whatever, the law remains clear on this situation.
Bro who even knows calculus anymore we have calculators for a reason 🤷♀️