Technology

36769 readers

87 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

MinutePhrase@lemmy.ml

NVIDIA: Copyrighted Books Are Just Statistical Correlations to Our AI Models. (torrentfreak.com)

submitted 6 months ago by ModerateImprovement@sh.itjust.works to c/technology@lemmy.ml

7 comments fedilink hide all child comments

all 8 comments

sorted by: hot top controversial new old

[–] CrabAndBroom@lemmy.ml 21 points 6 months ago

Well then I'm not pirating anything, I'm just downloading data and if it happens to correlate to the new Aliens movie then that's not my problem. 😮

[–] SnotFlickerman@lemmy.blahaj.zone 16 points 6 months ago (1 children)

I didn't pirate anything. I just lopped off a few frames from the original file and check it out, it produces a new hash.

Different hash, different files, so it's not actually breaking copyright!

It is fucking wild that this is basically what AI companies are arguing. "We did so much piracy it no longer counts as piracy."

[–] FaceDeer@fedia.io -1 points 6 months ago

That's not what they're arguing, not even close.

[–] linearchaos@lemmy.world 15 points 6 months ago

The coolest and most frightening thing about all that is the number of books they train the models on are immense, but the model data is very tiny comparatively. And while the compression is amazingly lossy it still has an amazing amount of the data in there.

To nvidas credit, The training models do not contain the contents of the books, but they can still tell you intimate details about the books without it being able to provide a photographic reproduction of everything in the book.

We've literally created something that can analyze books in the same way that we read them and retain the same lossy levels of information. That's honestly pretty f****** amazing.

Obviously intellectual property laws aren't designed for this. Hell even our concept of intellectual property isn't designed for this. If this was a corporation that hired a thousand people to read a bunch of books and be on tap for queries about the information in those books nobody would complain. One copy of each book purchased would be enough to cover the intellectual property restrictions for this.

Also obviously this isn't what happened and people see money lying on the table.

[–] queermunist@lemmy.ml 12 points 6 months ago (1 children)

They literally trained their models on the books lol

[–] mriguy@lemmy.world 8 points 6 months ago

You’d think they of all people would understand the concept of data leakage.

[–] lordnikon@lemmy.world 7 points 6 months ago