-1

Please help me to figure out the more adequate solution to the following problem, if you will.

I am writing the code for an app that replaces words on a text, from a certain way of writing them to another way of writing it; as, for example, replace all the words “colour” with “color”.

The code is doing it, but only by reading from and writing to .txt files, which Python does plainly.

However, I would like it to read from (and eventually write to) other text formats. So, I went look for solutions and I found two: textract and pandoc. Textract requires the previous installation of a full list of libraries and programs.

Pandoc only requires you to install itself before pip install pypandoc and use it, which looked nicer. (Pandoc installing guide refer that if you want also to be able to write on PDF you will need to install LaTex...)

My aim is to built a platform independent app. My question(s) is(are):

Will the app user have to install Pandoc (and eventually LaTex) on his machine to be able to use the app?

Would it be preferable (although very unprofessional, I suppose) to warn the user that he must use only (copy and paste to) .txt files with the app?

  • I would ask to the person that just down graded my question to explain why he did it. What do you think is wrong with the question? – J. Partridge Dec 29 '17 at 17:34
  • 1
    What formats do you need to open? The `open` function can handle a lot of formats. – rassar Dec 29 '17 at 19:19
  • 1
    I can only guess the reasons leading to a person downvoting the question, but a probable reason is that it's asking for a comparison of two tools, doesn't have a definite (non-opinion based) answer, and is only tangentially programming related. It might actually be off topic for stackoverflow. There are also a few things which are unclear: which platform do you need to target? Which output formats do you need? Later pandoc versions can use groff to generate PDF, so maybe you won't need LaTeX. – tarleb Dec 29 '17 at 22:24
  • rassar, I aimed to open DOC, DOCX, HTML, ODT, PDF, RTF, TXT and XML. Among other things, I read this before I came here: http://okfnlabs.org/blog/2013/10/17/python-guide-for-file-formats.html – J. Partridge Dec 29 '17 at 23:09
  • tarleb, as I say up there I would like the app to be platform independent. It should be able to run on Windows, macOS, Linux, and Android. Before I came here I read this: https://pandoc.org/installing.html – J. Partridge Dec 29 '17 at 23:22
  • @J.Partridge my bad, I missed that. I don't know if pandoc can be used on Android. At least there are no binaries available for Android. I'm not sure whether cross-compiling would work, as pandoc is written in Haskell. As for the rest: there are many apps which require the user to install pandoc separately, mostly for licensing reasons (pandoc is GPLv2+). I know nothing about textract, so I can't comment on that. – tarleb Dec 30 '17 at 08:42

1 Answers1

0

A few "moons" later I have an answer for my own question. So, I am sharing it. (Is it not the reason we all come here?) The code for the app is complete and working, and actually I ended not using either textract or pandoc. Here's a list of the modules I used instead: PyPDF, docx, ezodf, beautifulsoup, ebooklib - and some others as auxiliaries.

I am not pleased for having so many imports. Someone told me that I may do all this just using the NLTK library. Anyone confirms this before I hook myself on studying the thing? Thank you.