Can't read .docx file which i got after converting pdf using soffice command

Question

I am trying to convert pdf to docx using soffice. It converts it into .docx but it gives textboxes which I am unable to read using the docx api provided by python. Is there any better way to read the file or any better way to convert pdf to docx so that I do not get textboxes?

soffice --infilter="writer_pdf_import" --convert-to docx "convert_this.pdf"

score 0 · Answer 1 · answered Dec 16 '19 at 09:27

You can try using Aspose.Words for Cloud to convert PDF to Word documents. https://docs.aspose.cloud/display/wordscloud/Convert+PDF+Document+to+Word It converts PDF from fixed form to flow form so it is editable in MS Word.

Disclosure: I work at Aspose.Words team.

Can't read .docx file which i got after converting pdf using soffice command

1 Answers1