Open Source

31725 readers

126 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

Cloak@lemmy.ml

kevincox@lemmy.ml

CrypticCoffee@lemmy.ml

Lettuceeatlettuce@lemmy.ml

Can AI even be open source? It's complicated (miniza.pages.dev)

submitted 4 months ago by marvelous_coyote@lemm.ee to c/opensource@lemmy.ml

57 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] cmnybo@discuss.tchncs.de 40 points 4 months ago (4 children)

It's rather hard to open source the model when you trained it off a bunch of copyrighted content that you didn't have permission to use.

[–] marvelous_coyote@lemm.ee 9 points 4 months ago

My sentiments exactly

[–] chebra@mstdn.io 6 points 4 months ago (1 children)

@cmnybo @marvelous_coyote That's.. not how it works. You wouldn't see any copyrighted works in the model. We are already pretty sure even the closed models were trained on copyrighted works, based on what they sometimes produce. But even then, the AI companies aren't denying it. They are just saying it was all "fair use", they are using a legal loophole, and they might win this. Basically the only way they could be punished on copyright is if the models produce some copyrighted content verbatim.

[–] ReakDuck@lemmy.ml 1 points 4 months ago (1 children)

Like producing some images with Disney Logo

[–] chebra@mstdn.io 2 points 4 months ago (1 children)

@ReakDuck Yup, and that's a much better avenue to fight against the AI companies. Because fundamentally, this is almost impossible to avoid in the ML models. We should stop complaining about how they scraped copyrighted content, this complaint won't succeed until that legal loophole is removed. But when they reproduce copyrighted content, that could be fatal. And this applies also to reproducing GPL code samples by copilot for example.

[–] ReakDuck@lemmy.ml 1 points 4 months ago

Yeah, you just summarize my thoughts I had before chatGPT came to light.

Ok, not really. My thoughts were: could I store a Picture made illegaly into an LLM and later on ask it to show it again? Because I never stored it as a file and LLMs seem to not count as a storage.

I could store Pictures I would not be allowed to.

[–] flamingmongoose@lemmy.blahaj.zone 4 points 4 months ago (1 children)

BERT and early versions of GPT were trained on copyright free datasets like Wikipedia and out of copyright books. Unsure if those would be big enough for the modern ChatGPT types

[–] chebra@mstdn.io 2 points 4 months ago (1 children)

@flamingmongoose @cmnybo

> copyright free datasets like Wikipedia

🤦‍♂️

[–] flamingmongoose@lemmy.blahaj.zone 1 points 4 months ago

What's up with that? Appreciate they're permissive rather than copyright free as such

[–] Even_Adder@lemmy.dbzer0.com 3 points 4 months ago

Have you read this article by Cory Doctorow yet?