-1

I need to convert .pdf file to .jpeg file to do OCR of the text. I found this code:

from pdf2image import convert_from_path
pages = convert_from_path('img732.pdf', 500)
for page in pages:
  page.save('out.jpg', 'JPEG')

And I got this error:

Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-package\pdf2image\pdf2image.py", line 441, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] Impossibile trovare il file specificato

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\david\OneDrive\Desktop\SMEpy\prova!!!.py", line 2, in <module>
pages = convert_from_path('img732.pdf', 500)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-packages\pdf2image\pdf2image.py", line 97, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-packages\pdf2image\pdf2image.py", line 467, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

I have the .pdf file in the same directory of .py file. Where's the problem?

martineau
  • 119,623
  • 25
  • 170
  • 301
daav_v
  • 25
  • 6

1 Answers1

0

I guess this problem is library specific. However you can use this solution for run successfully.

  1. Download poppler tools for windows (I recommend latest version):
    http://blog.alivate.com.au/poppler-windows/
  2. After download extract to poppler folder any path
  3. Add environment variables poppler's "bin" folder:
  4. And restart your python workspace
martineau
  • 119,623
  • 25
  • 170
  • 301
eminaruk
  • 60
  • 5
  • sorry I didn't understand the 3. I dowload and extract the poppler folder. Now what should I do with the ''bin'' folder and there is a specific folder where to save the poppler folder? – daav_v May 01 '21 at 18:20
  • @daav_v: I believe **3** means to add the path `bin` sub-directory that poppler was installed into to the `PATH` environment variable. – martineau May 01 '21 at 18:30
  • @martineau @mucidix sry I still didn't understand, I need to add the 'bin' path in my code? like this `from pdf2image import convert_from_path popplerBin = r'C:\Users\david\AppData\Local\Programs\poppler-0.68.0\bin' pages = convert_from_path('img732.pdf', 500)` ? – daav_v May 01 '21 at 18:49
  • @daav_v: (correction) No, it's something you will need to do to configure your Windows computer after installing poppler. See [Add to the PATH on Windows 10](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/). – martineau May 01 '21 at 19:00