So when I use the pdf2image
python import, and pass a multi page PDF into the convert_from_bytes()
- or convert_from_path()
method, the output array does contain multiple images - but all images are of the last PDF page (whereas I would've expected that each image represented one of the PDF pages).
The output looks something like this:
Any idea on why this would occur? I can't find any solution to this online. I've found some vague suggestion that the use_cropbox
argument might be used, but modifying it has no effect.
def convert(opened_file)
# Read PDF and convert pages to PPM image objects
try:
_ppm_pages = self.pdf2image.convert_from_bytes(
opened_file.read(),
grayscale = True
)
except Exception as e:
print(f"[CreateJPEG] Could not convert PDF pages to JPEG image due to error: \n '{e}'")
return
# Do stuff with _ppm_pages
for img in _ppm_pages:
img.show() # ...all images in that list are of the last page
Sometimes the output is an empty 1x1 image, instead, which I also haven't found a reason for. So if you have any idea what that is about, please do let me know!
Thanks in advance, Simon
EDIT: Added code.
EDIT: So, when I try this in a random notebook, it actually works fine.
I've removed a few detours I used in my original code, and now it works. Still not sure what the underlying reason was though...
All the same, thanks for your help, everyone!