0

I want to convert a web page to a PDF using pdfkit (which is using wkhtmltopdf), however I have troubles to also get the pictures from the web page.

This is my code:

import pdfkit

config = pdfkit.configuration(wkhtmltopdf='C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe')

options = {
    # Allow access to local files (images)
    'enable-local-file-access': None,
    # Do not disable the inclusion of images    
    'no-images': None
}

pdfkit.from_url(
    'https://stadt-bremerhaven.de/eve-flare-portables-stimmungslicht-mit-thread-ausprobiert/',
    'out.pdf',
    configuration=config,
    options=options,
    verbose=True
)

The pictures in the article are not being included in the resulting PDF file. Is there a way to also include them, or maybe there is another library which is able to do it?

accdias
  • 5,160
  • 3
  • 19
  • 31

1 Answers1

0

There are several issues with headless printing remote sites, however it is best to use command line engines direct so in this case

wkhtmltopdf.exe https://stadt-bremerhaven.de/eve-flare-portables-stimmungslicht-mit-thread-ausprobiert/ out.pdf

should produce this result enter image description here

This is not unusual to see problematic output, since Wkhtml cannot directly accept cookies and the problem is raw headless running is non interactive.

There are ways around that by collect the cookies from a browser and apply to wkhtml

wkhtmltopdf.exe --cookie "__cmpconsentx47085" "CPuoyhgPuoyhgAfQ9BENDNCgAP_AAH_AAAigJSkR5D5MDWFBWX57QMskWYUX0MAVZyADChaAAaABCDAAcKQAkkEaIAyAAAACAQgAIBYBAAAADAlAAEAQQIhBAAHgAgAEoBAIIAAEABERQUIAAAoKAIgAEAAIAAExKECAkALQAobiREAAkIAiQIAAgAAAAIABAhMAAAAIAAACAAIAAACAAAAAAAAAAAACABAAAAAAAAAAIJSkR5D5MDWFBWX57QMskWYUX0MAVZyADChaAAaABCDAAcKQAkkEaIAyAAAACAQgAIBYBAAAADAlAAEAQQIhBAAHgAgAEoBAIIAAEABERQUIAAAoKAIgAEAAIAAExKECAkALQAobiREAAkIAiQIAAgAAAAIABAhMAAAAIAAACAAIAAACAAAAAAAAAAAACABAAAAAAAAAAIAA" --cookie "__cmpcccx47085" "aBPuqTTPgAACgALAAuABoAEoKWwAAA" https://stadt-bremerhaven.de/eve-flare-portables-stimmungslicht-mit-thread-ausprobiert/ out.pdf

Which provides this output

enter image description here

I was surprised at how focused the 7 page output was since the alternative would be to run edge headless which then includes all the advertising sidebars etc but is easier to run headless by accepting cookies first.

"C:\Program Files\Microsoft\Edge\Application\msedge.exe" --headless=old --print-to-pdf="%cd%\out.pdf" --enable-logging --print-to-pdf-no-header --run-all-compositor-stages-before-draw "https://stadt-bremerhaven.de/eve-flare-portables-stimmungslicht-mit-thread-ausprobiert/"

NOTE your "program files" location may be different for language or 64 bit so check which one you use it may be \Program Files (86)\ the result should be more in keeping with a browser viewer.

enter image description here

If you curl down the html curl -o get.html "https://stadt-bremerhaven.de/eve-flare-portables-stimmungslicht-mit-thread-ausprobiert/" of course you can then alter the html however you wish before printing.

enter image description here

Note you can also edit the Wording or image sizes too. enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36