-1

The PDF is a scanned image, so there is no way I have found yet, to pull out the images. I have tried methods including crop and media boxes, but it pulls the entire pages as images. I have also tried other parsing libraries like pdfminer.six, but the entire page is pulled as a result.

I tried using media and cropboxes in hopes it would grab the image as specified but it pulls the entire page instead.

1 Answers1

0

If the document is scanned, the whole page is a single image. So all libraries will give you that.

As the maintainer of pypdf and PyPDF2, I can tell you that there is no way around that.

If you want the illustrations within an image file, you need machine learning. Our using an image cropping tool.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958