~Since this converts to Markdown, it's inherently a very lossy conversion. What's hard to pull off is preserve the full formatting when converting to an odt or something.~

Someone pointed out it doesn't just convert word documents to Markdown, it can also transcribe and OCR, so I guess it does have some usefulness!

[–] davel@lemmy.ml 9 points 2 weeks ago (2 children)

In your saying this isn’t useful, you’re making a lot of assumptions about how someone might want to use this.

They may not care that it is lossy in the way that it is lossy.
They may want a CLI tool instead of a GUI tool.
They may want it as a Python library rather than as a stand-alone tool.

[–] vort3@lemmy.ml 3 points 2 weeks ago

I convert from docx to md specifically with the purpose of getting rid of Microsoft formatting aka almost converting to plaintext but preserve at least some structure.

[–] utopiah@lemmy.ml 2 points 2 weeks ago

soffice works as CLI, can be called from Python and has plenty of related tooling, e.g. https://pypi.org/project/unoserver/ so I agree, I'm confused at what's actually novel and better than that or even dedicated long lasting FLOSS projects like pandoc.

[–] django@discuss.tchncs.de 3 points 2 weeks ago (1 children)

I like libreoffice, but converting audio files to markdown must be a pretty recent feature, for I never heard of it before being part of libreoffice.

[–] utopiah@lemmy.ml 2 points 2 weeks ago (1 children)

converting audio files to markdown must be a pretty recent feature

Quite curious... does it actually do that and if so how? Because STT to get a plaintext file or subtitle (so with timing) has been available via e.g. Whisper quite efficiently for a while now. If this though does do more, e.g. structure (differentiating a title, list, etc) I'd like to learn how.

[–] django@discuss.tchncs.de 3 points 2 weeks ago (1 children)

There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691

Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.

[–] utopiah@lemmy.ml 1 points 2 weeks ago (1 children)

Thanks for the clarification. I checked the code you linked and noticed recognize_google and seems it's relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?

[–] django@discuss.tchncs.de 1 points 2 weeks ago (1 children)

Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.

[–] utopiah@lemmy.ml 3 points 2 weeks ago

Might open up a GDPR related issue there. I don't think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.