For pages with tabular data in landscape format, the words in the HTML outcome overlap. For pages in portrait formats, the conversion is succesful. Any ideas how to fix that?
[Here is an example with the converted pdf to html in landscape format] [1]: https://i.stack.imgur.com/twbzw.png [2]: https://i.stack.imgur.com/Ln56P.png
import ntpath
from pathlib import Path
import fitz
doc = fitz.open(in_path) # open document
out = open(in_path + ".html", "wb") # open text output
for page in doc: # iterate the document pages
page.read_contents()
text = page.get_text('html', clip = None).encode("utf8")
out.write(text) # write text of page
out.write(bytes((12,))) # write page delimiter (form feed 0x0C)
out.close()