this post was submitted on 09 Oct 2024
612 points (96.6% liked)

Technology

59314 readers
4948 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

I suspect that this is the direct result of AI generated content just overwhelming any real content.

I tried ddg, google, bing, quant, and none of them really help me find information I want these days.

Perplexity seems to work but I don't like the idea of AI giving me "facts" since they are mostly based on other AI posts

ETA: someone suggested SearXNG and after using it a bit it seems to be much better compared to ddg and the rest.

you are viewing a single comment's thread
view the rest of the comments
[–] phoenixz@lemmy.ca 3 points 1 month ago (2 children)

So what about open source self hosted search engines? If it requires some hardware I'd gladly team up with a small group of people to finance a bigass server that just gets us our personal search engine

Any good ones out there?

[–] MTK@lemmy.world 2 points 1 month ago (1 children)

Searxng, but there are plenty of instances already

[–] Hawk@lemmynsfw.com 2 points 1 month ago (1 children)

Perplexica is interesting too, but it uses a moderate amount of ram because of elastic search.

And of course you need to have ollama running

[–] MTK@lemmy.world 1 points 1 month ago
[–] LonelyNematocyst@lemmy.world 1 points 1 month ago

There's stuff like Searxng or whoogle, but these aren't "real" search engines, merely "search aggregators" - they relay requests to a bunch of actual search engines, like bing or google, and aggregate the results. That's why they don't require tons of compute and scraping, and also why they often fail to work (since the search engines in question don't like or allow this). I believe it's not feasible to run a "real" search engine alone or even as a small group of people - according to this comment you need a powerful server with terabytes* of drive, hundreds of gigabytes of RAM and a lot of compute - and all of this will just let you crawl some top domains, nowhere near a good chunk of the internet.

*which sounds low actually, I would have expected more for this