this post was submitted on 11 Jan 2024

250 points (99.6% liked)

Technology

37747 readers

203 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

coldredlight@beehaw.org

Los@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

250

OpenAI says it’s “impossible” to create useful AI models without copyrighted material (arstechnica.com)

submitted 10 months ago by sculd@beehaw.org to c/technology@beehaw.org

246 comments fedilink hide all child comments

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

(page 2) 50 comments

sorted by: hot top controversial new old

[–] Kolanaki@yiffit.net 11 points 10 months ago* (last edited 10 months ago)

Then pay for the material like everyone else who can't do things without someone else's copyrighted materials.

[–] onlinepersona@programming.dev 10 points 10 months ago (3 children)

Wait, so if the way I make money is illegal now, it's the system's fault, isn't it? That means I can keep going because I believe I'm justified, right? Right?

CC BY-NC-SA 4.0

load more comments (3 replies)

[–] ky56@aussie.zone 10 points 10 months ago (1 children)

All the AI race has done is surface the long standing issue of how broken copyright is for the online internet era. Artists should be compensated but trying to do that using the traditional model which was originally designed with physical, non infinitely copyable goods in mind is just asinine.

One such model could be to make the copyright owner automatically assigned by first upload on any platform that supports the API. An API provided and enforced by the US copyright office. A percentage of the end use case can be paid back as royalties. I haven't really thought out this model much further than this.

Machine learning is here to say and is a useful tool that can be used for good and evil things alike.

[–] Kichae@lemmy.ca 9 points 10 months ago (1 children)

Nah. Copyright is broken, but it's broken because it lasts too long, and it can be held by constructs. People should still reserve the right to not have the things they've made incorporated into projects or products they don't want to be associated with.

The right to refusal is important. Consent is important. The default permission should not be shifted to "yes" in anybody's mind.

The fact that a not insignificant number of people seem to think the only issue here is money points to some pretty fucking entitled views among the would-be-billionaires.

load more comments (1 replies)

[–] GammaGames@beehaw.org 8 points 10 months ago

Could they be legally required to open source the llm? I believe them, but that doesn’t make it right

[–] furrowsofar@beehaw.org 8 points 10 months ago

Of course it is. About 50 years ago we went to a regime where everything is copywrited rather then just things that were marked and registered. Not sure where.I stand on that. One could argue we are in a crazy over copyright era now anyway.

[–] BoastfulDaedra@lemmynsfw.com 7 points 10 months ago (2 children)

Yes, well, a pirate ship can't stay in business without raiding trade convoys, either.

load more comments (2 replies)

[–] Critical_Insight@feddit.uk 7 points 10 months ago (2 children)

There's not a musician that havent heard other songs before. Not a painter that haven't seen other painting. No comedian that haven't heard jokes. No writer that haven't read books.

AI haters are not applying the same standards to humans that they do to generative AI. Obviously this is not to say that AI can't plagiarize. If it's spitting out sentences that are direct quotes from an article someone wrote before and doesn't disclose the source then yeah that is an issue. There's however a limit after which the output differs enough from the input that you can't claim it's stealing even if perfectly mimics the style of someone else.

Just because DallE creates pictures that have getty images watermark on them it doesn't mean the picture itself is a direct copy from their database. If anything it's the use of the logo that's the issue. Not the picture.

[–] sculd@beehaw.org 8 points 10 months ago (2 children)

Said in another thread but I will repeat here. AIs are not humans. AIs' creative process and learning process are also different.

AIs are being used to make profit for executives while creators suffer.

load more comments (2 replies)

[–] BraveSirZaphod@kbin.social 7 points 10 months ago

AI haters are not applying the same standards to humans that they do to generative AI

I don't think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There's a meaningfully different level of scale here, and so it's not necessarily obvious that the same standards should apply.

If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.

A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they've generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.

[–] DavidGarcia@feddit.nl 6 points 10 months ago

ip protections are a spook anyway

load more comments