Questions tagged [pdf2image]

A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.

pdf2image is a Python package that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object.

Resources

71 questions
1
vote
1 answer

cannot identify image file <_io.BytesIO object at 0x7f8bbdc115f0>

Tried to convert the pdf to an image in colab. It was working fine till yesterday but not working today. Not sure what causes the issue. from pdf2image import convert_from_path import glob pdf_dir = glob.glob(r'/content/first_page.pdf') img_dir =…
Pravin
  • 241
  • 2
  • 14
1
vote
1 answer

pdf2image converts page with page information that is not visible in the pdf viewer

I convert PDF to image using pdf2image which is python package. But in result, PDF page information(?), which is not visible on pdf viewer, is appeared. How can i remove page information on PDF, not on image? PDF file link is…
yes89929
  • 319
  • 1
  • 4
  • 11
1
vote
1 answer

Using pdf2image with Node.js and CentOS

I'm using pdf2image to build a Node.js application that convert PDF file to PNG. As the readme of the official repo says, pdf2image requires two external dependencies: Ghostscript and GraphicsMagick. The two are installed in my local Windows…
aymen2709
  • 21
  • 6
1
vote
2 answers

"Unable to get page count. Is poppler installed and in PATH?"

I am trying to import pdf2image but end up to this error "Unable to get page count. Is poppler installed and in PATH?" I am using Anaconda 2.1.4 and Jupyter Notebook 6.4.5
1
vote
0 answers

Why is pdftocairo not getting installed when installing poppler?

I am using the following command in the dockerfile to install poppler RUN wget --directory-prefix=~ poppler.freedesktop.org/poppler-22.04.0.tar.xz -O /tmp/poppler-22.04.0.tar.xz && apt-get install xz-utils && tar xf /tmp/poppler-22.04.0.tar.xz -C…
abhinav kumar
  • 153
  • 1
  • 9
1
vote
2 answers

How do I convert a multiple paged PDF into a PNG image per pdf page in Python

Amateur Python developer here. I'm working on a project where I take multiple PDfs, each one with varying amounts of pages(1-20ish), and turn them into PNG files to use with pytesseract later. I'm using pdf2image and poppler on a test pdf that has…
1
vote
0 answers

Is there a way to include poppler files with a mac app developed using python

I'm developing a macOS app using python that has a step to convert pdf to image. pdf2image is being used, and it has a dependency on poppler. I want the app such that it does not have any external dependencies. (Such as installation of homebrew and…
1
vote
2 answers

pdf2image conversion of multi page PDFs to images returns the last page on all images

So when I use the pdf2image python import, and pass a multi page PDF into the convert_from_bytes()- or convert_from_path() method, the output array does contain multiple images - but all images are of the last PDF page (whereas I would've expected…
1
vote
1 answer

pdf2image wrong font and crop text

I am converting my PDF into an image in Python with convert_from_path from pdf2image library. This is the original PDF : This is the generated image : As you can see, the issue here is that the font in the image is not the good one and also that…
LCMa
  • 445
  • 3
  • 13
1
vote
0 answers

Error with pdf2pic package on node azure function

I'm building an Azure Function that triggers when a pdf is uploaded to blob storage then converts the pdf into a jpg file and saves it into another storage, i saw the library pdf2pic is good at pdf convertion, i installed GraphicsMagik, Ghostscript…
1
vote
0 answers

Getting an empty list for pdf2image (convert_from_bytes)

I am trying to convert PDF to images using convert_from_bytes function from pdf2image package. I have an application on 2 servers, on one server it is able to make the conversion properly, but on the other one it is giving me an empty list []. I…
Annabelle79
  • 101
  • 1
  • 8
1
vote
2 answers

How do I convert multiple PDFs into images from the same folder in Python?

from pdf2image import convert_from_path images = convert_from_path('path.pdf',poppler_path=r"E:/software/poppler-0.67.0/bin") for i in range(len(images)): images[i].save('image_name'+ str(i) +'.jpg', 'JPEG') But now I want to convert more…
techMayu
  • 31
  • 8
1
vote
2 answers

Render PDF into an image (self-contained, no external command line dependencies) (to use on AWS Lambda)

I need a simple python library to convert PDF to image (render the PDF as is), but after hours of searching, I keep hitting the same wall, I find libraries like pdf2image python library (and many similar ones), which depend on external applications…
Saw
  • 6,199
  • 11
  • 53
  • 104
1
vote
1 answer

How to install python-poppler

I downloaded poppler from https://github.com/oschwartz10612/poppler-windows/releases/tag/v21.03.0 and tried to install it with pip install python-poppler in Command Prompt. It caught error: Running setup.py clean for python-poppler Failed to build…
nilsinelabore
  • 4,143
  • 17
  • 65
  • 122
1
vote
1 answer

Time efficient way to convert PDF to image

Context: I have PDF files I'm working with. I'm using an ocr to extract the text from these documents and to be able to do that I have to convert my pdf files to images. I currently use the convert_from_path function of the pdf2image module but it…
zanga
  • 612
  • 4
  • 20