0

I am trying to convert a simple docx file into HTML file using mammoth package. But it seems that the generated html contains only part of a full HTML file: the HTML, head, and body tags are all missing in the generated html string.

I wonder if there are parameters to make the HTML become valid HTML code.

jdhao
  • 24,001
  • 18
  • 134
  • 273

1 Answers1

1

I read the doc and haven't found an option to generate the full HTML. Since the generated HTML is just a string, it is easy to make it a full HTML-compliant:

import mammoth

with open("test.docx", "rb") as docx_file:
    result = mammoth.convert_to_html(docx_file)
    html = result.value  # The generated HTML
    messages = result.messages  # Any messages,

    full_html = (
        '<!DOCTYPE html><html><head><meta charset="utf-8"/></head><body>'
        + html
        + "</body></html>"
    )

    with open("test.html", "w", encoding="utf-8") as f:
        f.write(full_html)

In the above code, we just prepend and append the necessary opening and closing tags to make the html string a valid HTML source code.

jdhao
  • 24,001
  • 18
  • 134
  • 273