this post was submitted on 06 Sep 2024
1725 points (90.1% liked)
Technology
61227 readers
4172 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
"Theft" is never a technically accurate word when dealing with the so called "intellectual property", because the digital content being copied without authorization is legal in tons of cases, and because, come on, property is very explicitly exclusive. I cannot copy my house or my car, but I can make copies of my works for virtually 0 cost.
Using data for training ML models is even explicitly allowed in some jurisdictions (e.g. Japan), and is likely to be fair use everywhere else. LLMs are very transformative, and while they often can produce verbatim copies of fragments of copyrighted works, they don't store the whole works or significant pieces of them.
Don't get me wrong, I don't like big companies making big money. I would not mind a law that would force models to be open sourced. But restricting them to train their models on public data by restricting fair use, it would harm them very little (they could pay something if they are making some profit), while small researchers or companies would never be able to compete, because they would not have the upfront costs, nor the economic engineering to disguise profits and pay less.