this post was submitted on 21 Dec 2023
12 points (100.0% liked)
Free and Open Source Software
17926 readers
32 users here now
If it's free and open source and it's also software, it can be discussed here. Subcommunity of Technology.
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I have had good results with Tesseract. I had to export the PDF to individual jpegs, then batch OCR'd them with tesseract, then merged the individual pages back into a single PDF. If you don't want to use command line and are okay with it not being open source, PDF24.org does a good job and does not charge.
If you want to host it locally, Stirling PDF can be run in docker, and uses a library that uses Tesseract. Has a bunch of other handy PDF operations, too. I keep it around for the two times a year I need to merge, split, or decrypt PDFs.
https://github.com/Frooodle/Stirling-PDF/blob/main/HowToUseOCR.md
It can do it straight from PDF and do multiple files at a time.
This is amazing. Did not realize it existed. Thank you for sharing