this post was submitted on 30 Nov 2024
74 points (92.0% liked)

Technology

59756 readers
2785 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
 

I feel like every day I come across 15-20 "AI-powered tool"s that "analyze" something, and none of them clearly state how they use data. This one seems harmless enough, put a profile in, it will scrape everything about them, all their personal information, their location, every post they ever made... Nothing can possibly go wrong aggregating all that personal info, right? No idea where this data is sent, where it's stored, who it's sold to. Kinda alarming

top 16 comments
sorted by: hot top controversial new old
[–] General_Effort@lemmy.world 25 points 3 days ago (2 children)

A toy like that is easy to create and not that expensive to offer. Much more expensive than some JavaScript or CSS, but in the end it's not that different.

I think people don't really understand this whole scraping thing. For example, you can torrent all of Reddit until the API-change; all the comments, profiles, usernames, including now deleted stuff. There is a lot of outrage here over Reddit cracking down on these 3rd party tools. It's difficult to see how that outrage over cracking down on 3rd party tools, fits with this outrage here over not cracking down on 3rd party tools.

Anyway, if someone want to archive all of Bluesky, they don't need to offer some AI toy. They can just download the content via the API.

You can still torrent Reddit pushshift data past the API change. But yea I definitely agree otherwise, these are just cheap toys that less experienced developers create for portfolios.

[–] dustyData@lemmy.world 19 points 3 days ago* (last edited 3 days ago) (1 children)

The only money to be made in the LLM craze is data scraping, collection, filtering, collation and data set selling. When in a gold rush, don't dig, sell shovels. And AI needs a shit ton of shovels.

The only people making money are Nvidia, the third party data center operators and data brokers. Everyone else running and using the models are losing money. Even OpenAI, the biggest AI vendor, is running at a loss. Eventually the bubble will burst and data brokers will still have something to sell. In the mean time, the fastest way to increase model performance is by increasing the size, that means more data is needed to train them.

[–] CheeseNoodle@lemmy.world 3 points 3 days ago

Hey hey, there's a flourishing market for NSFW ai chatbots that I'm sure is raking in the cash essentially re-selling access credits at a higher price.

[–] NeoNachtwaechter@lemmy.world 15 points 3 days ago

AI has tons of money.

AI companies either do this scraping, or they buy data from others who have done such scraping.

Since the AI companies are sitting on full treasure chests (venture capital), there is a vibrant market at the moment.

[–] breakingcups@lemmy.world 6 points 3 days ago (1 children)

LLM is at its core just a text processing tool. For it to be remotely useful when you're not generating text from nothing, you need data to process. Preferably larger amounts so you appear more useful. Scraping websites like this is a good way to get source data useful for an individual who you're trying to convince to give you money.

[–] toiletobserver@lemmy.world 4 points 3 days ago

So you're saying LLM is a pyramid scheme?!? Pardon me while i clutch my pearls.

[–] 9point6@lemmy.world 3 points 3 days ago

AI "tools" like this are an absolute piece of piss to create, and they are also the kind of thing that bro investors love to throw money at currently

[–] chrash0@lemmy.world 2 points 3 days ago

this is just combining existing data scraping tools with LLMs to create a pretty flimsy and superfluous product. they use the data to do what they say. if they wanted to scrape data on you they can already do that. all they get from this is your interest and maybe some other PII like your email address. the LLM is just incidental here. it’s honestly not even as bad privacy wise as a “hot or not” or personality quiz.

[–] bulwark@lemmy.world 0 points 3 days ago

AI is just trending. NFTs and before that crypto had their moment, and they really were everywhere. It was shoehorned into places that didn't even make sense. And I think sane investors realized that and pulled out.