this post was submitted on 06 Jun 2024
209 points (96.9% liked)

Open Source

31359 readers
181 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS
 

I'm currently looking to develop an open source app that can help somebody. I'm currently out of ideas, so I'd like to heard if from you guys.

Sorry if it seems to lazy to ask for ideas like that, I just thought that I could do it since the result will be a free app.

you are viewing a single comment's thread
view the rest of the comments
[–] Dogyote@slrpnk.net 2 points 5 months ago (1 children)

Bah, the data is on their websites, figure out how to collect it.

[–] Zetaphor@zemmy.cc 2 points 5 months ago (1 children)

Go ahead and try scraping an arbitrary list of sites without an API and let me know how that goes. It would be a constant maintenance headache, especially if you're talking about anything other than the larger chains that have fairly standardized sites

[–] Dogyote@slrpnk.net 1 points 5 months ago (3 children)
[–] lastweakness@lemmy.world 1 points 5 months ago

I don't think you understand how AIs work

[–] chebra@mstdn.io 1 points 5 months ago

@Dogyote @Zetaphor

I've been webscraping in my job for 6 years. Yes, it's a constant headache, they keep updating their sites and improving their antibot protections. But it can be done and some companies are doing it (on a biiiiig scale). It's just not very realistic that an open-source project would be able to invest that much effort into all the updates. Well some do, youtube-dl is basically webscraping and they are pretty up-to-date. It's just very rare.

[–] chebra@mstdn.io 1 points 5 months ago

@Dogyote @Zetaphor

And we also explored the AI option, it always turned out unrealistic. Either you would have to scrape the content and send it to the AI to parse the info, but then you'd be paying for every scrape, or run a powerful rig nonstop, but the results would still be hit and miss. Or you might let the AI generate the code for the scraping module, still not ideal, they were constantly hallucinating things that weren't there.