python: html string to pdf via Pdfkit: avoid image to span into 2 pages

Question

I want to output html string to pdf via Pdfkit and python. The html string included an image. The problem is that the image spanned into 2 pages as shown below.

Assume the image can be held in one page,

how to make the image not span into 2 pages via Pdfkit and python?
Or if Pdfkit can't do it, any other methods?

The source is html string. Therefore, I can't calculate if the space left in one page can hold the size of the image. Any idea? Thank you.

Followed is the code. qqaa is a base64 image data. If I included the data, it would break the limit 30000 of stackoverflow. So, the code below wouldn't run. I didn't know how I can attach the python script.

import pdfkit

html_str = ''

for i in range(1,15):
    html_str += '<p>many row</p>'

html_str += '<h3>3.1.12 Draw</h3><h4>3.1.13.1 3D</h4><img src="data:image/png;base64,qqaa" alt="3D Structure">'

opt = {'encoding': 'UTF-8', 'orientation': 'Landscape', 'margin-top': '0.5in', 'margin-bottom': '0.5in', 'margin-left': '0.75in', 'margin-right': '0.75in', 'outline-depth': 6, 'header-center': 'whatever', 'header-right': 'Page: [page]/[toPage]', 'header-line': '', 'header-spacing': 2, 'footer-right': 'Date: [date]', 'footer-line': '', 'footer-spacing': 2, 'enable-local-file-access': None}
pdfkit.from_string(html_str, 'out.pdf', options=opt)

pdfkit.from_string(html_str, 'out.pdf', options=opt)

Edit: one solution is to put <P style="page-break-before: always"> directly in the html string.

<P style="page-break-before: always"><img src="data:image/png;base64,qqaa" alt="3D Structure">

Tranbi · Answer 1 · 2023-08-08T21:04:15.477

0

You can try adding the css option page-break-inside: avoid for img elements.

Edit: create a file img.css with the following:

img {
  page-break-inside: avoid !important;
}

And pass the file path to pdfkit.from_string:

import pdfkit

html_str = ''

for i in range(1,15):
    html_str += '<p>many row</p>'

html_str += '<h3>3.1.12 Draw</h3><h4>3.1.13.1 3D</h4><img src="data:image/png;base64,qqaa" alt="3D Structure">'

opt = {'encoding': 'UTF-8', 'orientation': 'Landscape', 'margin-top': '0.5in', 'margin-bottom': '0.5in', 'margin-left': '0.75in', 'margin-right': '0.75in', 'outline-depth': 6, 'header-center': 'whatever', 'header-right': 'Page: [page]/[toPage]', 'header-line': '', 'header-spacing': 2, 'footer-right': 'Date: [date]', 'footer-line': '', 'footer-spacing': 2, 'enable-local-file-access': None}

pdfkit.from_string(html_str, 'out.pdf', options=opt, css='img.css')

Note: it seems that the property is now being replace by break-inside

edited Aug 08 '23 at 21:04

answered Aug 08 '23 at 11:23

Tranbi

11,407
6
16
33

The css `page-break-inside` for img or figure is not recognized by pdfkit: `OSError: [Errno 22] Invalid argument: '\nimg {\n page-break-inside: avoid;\n}\n'`. Neither does `break-inside`. Inspired by you by css and after Google, the solution is to put `
` directly in the html string, say, `
`
– warem Aug 08 '23 at 20:41
After double checking, it seems that the kwarg `css`takes a file path as argument (see https://github.com/JazzCore/python-pdfkit/blob/c83fa250f85b210f4a0f05ca613ec0fb9580732e/pdfkit/api.py#L54). I've updated my answer accordingly. I cannot test it though so let me know! – Tranbi Aug 08 '23 at 21:04
Also adding `!important` might help in some case. – Tranbi Aug 08 '23 at 21:08
1

By using css file, pdfkit can run without complaint 'Invalid argument'. But `page-break-inside: avoid !important;` and `page-break-before: always;` don't work. After Google, sounds an issue of pdfkit(wkhtmltopdf) that `page-break-xxx` doesn't work. – warem Aug 08 '23 at 21:44

python: html string to pdf via Pdfkit: avoid image to span into 2 pages

1 Answers1