2

So I a quite new to Python so it may be a silly question but i can't seem to find the solution anywhere.

I have a django site I am running it locally on my machine just for development. on the site I want to convert a docx file to pdf. I want to use pandoc to do this. I know there are other methods such as online apis or the python modules such as "docx2pdf". However i want to use pandoc for deployment reasons.

I have installed pandoc on my terminal using brew install pandoc. so it should b installed correctly.

In my django project i am doing:

import pypandoc
import docx

def making_a_doc_function(request):
    doc = docx.Document()
    doc.add_heading("MY DOCUMENT")
    doc.save('thisisdoc.docx')
    pypandoc.convert_file('thisisdoc.docx', 'docx', outputfile="thisisdoc.pdf")     
    pdf = open('thisisdoc.pdf', 'rb')
    response = FileResponse(pdf) 
return response

The docx file get created no problem but it not pdf has been created. I am getting an error that says:

Pandoc died with exitcode "4" during conversion: b'cannot produce pdf output from docx\n'

Does anyone have any ideas?

Kitchen
  • 73
  • 1
  • 2
  • 7

1 Answers1

2

The second argument to convert_file is output format, or, in this case, the format through which pandoc generates the pdf. Pandoc doesn't know how to produce a PDF through docx, hence the error.

Use pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") or pypandoc.convert_file('thisisdoc.docx', 'pdf', outputfile="thisisdoc.pdf") instead.

tarleb
  • 19,863
  • 4
  • 51
  • 80
  • 2
    pypandoc.convert_file('thisisdoc.docx', 'latex', outputfile="thisisdoc.pdf") this one worked. But i had to get pdflatex installed too on the same path – Kitchen Nov 12 '20 at 23:25
  • this solution works but the generated PDF has a "latex feeling" (even if "pdf" is set as the second argument and not "latex"). By "latex feeling" I mean it looks like rendered latex rather than a docx file converted to pdf (similar to how MS Word does it). Also all docx formatting (alignments, bold words, etc) is not preserved. – pcko1 Jun 24 '22 at 20:29
  • @pcko1 that's by design, pandoc transforms the content, not the style (but bold words are relevant for the content and should be preserved). If, for whatever reason, you want to preserve the Word look, then use Word or LibreOffice command line tools to produce a PDF. – tarleb Jun 25 '22 at 09:34