1

I am trying to convert pdf to docx using soffice. It converts it into .docx but it gives textboxes which I am unable to read using the docx api provided by python. Is there any better way to read the file or any better way to convert pdf to docx so that I do not get textboxes?

soffice --infilter="writer_pdf_import" --convert-to docx "convert_this.pdf"
Saucy Goat
  • 1,587
  • 1
  • 11
  • 32

1 Answers1

0

You can try using Aspose.Words for Cloud to convert PDF to Word documents. https://docs.aspose.cloud/display/wordscloud/Convert+PDF+Document+to+Word It converts PDF from fixed form to flow form so it is editable in MS Word.

Disclosure: I work at Aspose.Words team.

Alexey Noskov
  • 1,722
  • 1
  • 7
  • 13