0

I am using PyQt5 with QWebEnginePage to convert a html to pdf.

The html is generated from Google Docs by exporting a document as html, and the html consists of some chinese characters.

The size of the html file is around 44KB and it does not include any images.

However, when I export the html to pdf using QWebEnginePage::printToPdf, the pdf's size is unexpectedly large, which is around 15MB.

Does anyone have some thought on why the pdf is too big? Currently I suspect that is because the html consists of chinese characters and the web engine embeds the font into the pdf.

Danny Lau
  • 143
  • 1
  • 7
  • 1
    You are probably right about suspecting the reason to be the embedded fontfile size. You can use PyMuPDF to reduce the PDF filesize by creating font subsets: open the PDF as a PyMuPDF document `doc = fitz.open(filename)`, then execute `doc.subset_fonts()`, then save under a new file `doc.ez_save("new.pdf")`. This should work if the embedded font is a TTF or OTF. – Jorj McKie Feb 01 '23 at 08:46

0 Answers0