this post was submitted on 20 Aug 2024
21 points (100.0% liked)

Self Hosted - Self-hosting your services.

11419 readers
2 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules

Important

Beginning of January 1st 2024 this rule WILL be enforced. Posts that are not tagged will be warned and if not fixed within 24h then removed!

Cross-posting

If you see a rule-breaker please DM the mods!

founded 3 years ago
MODERATORS
 

Say I have a large txt or CSV file with data I want to search. And say I have several files.

What is the best way to index and make this data searchable? I've been using grep, but it is not ideal.

Is there any self hostable docker container for indexing and searching this? Or maybe should I use SQL?

you are viewing a single comment's thread
view the rest of the comments
[–] AnnaFrankfurter@lemmy.ml 3 points 2 months ago* (last edited 2 months ago)

Depends on the size of data, use case like will you be doing any constant updates to it or just reading, you mentioned you have several files so do you need joins if so what will be an approx max number of joins you'll be doing on per query basis, I guess you said CSV so I'm assuming it is structured data and not semi structured or unstructured.

Few more questions, do you need a fast indexing but are not planning on doing any complex operations, areyoiu going to do a lot of OLTP operations and you need ACID. Or are you going OLAP route. are you planning on distributed database if so then which 2 do you want from CAP, do you want batch processing or stream processing,

I've few dozen other questions also