2

I'm trying to convert Doc/Docx files to HTML. So far I have discovered using LibreOffice its possible to achieve it in headless mode. I am able to convert documents to HTML and able to get the images inline as well using the command below(in osx):

soffice --convert-to html:HTML:EmbedImages file_to_convert

and in ubuntu the command is:

libreoffice --convert-to HTML:HTML --outdir ${outputPath} ${file.fullPath}

When the document is converted from doc to HTML, the fonts are not embedded as based64 format in the HTML file. It creates an HTML file without embedded fonts. Is there any solution to embed font as based64 format in the HTML file such that the output html is exactly the same as the doc/docx content?

Ryan
  • 2,473
  • 1
  • 11
  • 14
Vaibhav Bhuva
  • 445
  • 4
  • 15

1 Answers1

0

If you're not opposed to learning something new, I would suggest looking into the Python module 'python-docx' here.

With it, you can make Docx files or update them. It can also open Docx files and you can use it to write a custom conversion script. I've been using it to convert HTML to Docx and it has been very useful.

Ge To
  • 107
  • 5