Hi I am facing issues while trying to convert PDF files to .jpeg I am running python from anaconda distribution on windows machine.
Below is the code that is working for some of the pdfs
import os
from wand.image import Image as wi
pdf_dir = r"C:\\Users\Downloads\python computer vison\Computer-Vision-with-Python\pdf_to_convert"
os.chdir(pdf_dir)
path = r"C:/Users/Downloads/python computer vison/Computer-Vision-with-Python/jpeg_extract/"
for pdf_file in os.listdir(pdf_dir):
print("filename is ",pdf_file)
pdf = wi(filename=pdf_file,resolution=300)
#print("filename is ",pdf_file)
pdfImage = pdf.convert("jpeg")
i = 1
for img in pdfImage.sequence:
page = wi(image=img)
page.save(filename=path+pdf_file+str(i)+".jpg")
i+=
and below is the output
filename is tmpdocument-page0.pdf
filename is tmpdocument-page1.pdf
filename is tmpdocument-page100.pdf
filename is tmpdocument-page1000.pdf
filename is tmpdocument-page1001.pdf
filename is tmpdocument-page1002.pdf
filename is tmpdocument-page1003.pdf
filename is tmpdocument-page1004.pdf
filename is tmpdocument-page1005.pdf
filename is tmpdocument-page1006.pdf
filename is tmpdocument-page1007.pdf
filename is tmpdocument-page1008.pdf
filename is tmpdocument-page1009.pdf
filename is tmpdocument-page1012.pdf
---------------------------------------------------------------------------
CorruptImageError Traceback (most recent call last)
<ipython-input-7-84715f25da7c> in <module>()
8 #path = r"C://Users/Downloads/Work /ml_training_samples/tmp/"
9 print("filename is ",pdf_file)
---> 10 pdf = wi(filename=pdf_file,resolution=300)
11 #print("filename is ",pdf_file)
12 pdfImage = pdf.convert("jpeg")
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\image.py in __init__(self, image, blob, file, filename, format, width, height, depth, background, resolution, pseudo)
4706 self.read(blob=blob, resolution=resolution)
4707 elif filename is not None:
-> 4708 self.read(filename=filename, resolution=resolution)
4709 # clear the wand format, otherwise any subsequent call to
4710 # MagickGetImageBlob will silently change the image to this
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\image.py in read(self, file, filename, blob, resolution)
5000 r = library.MagickReadImage(self.wand, filename)
5001 if not r:
-> 5002 self.raise_exception()
5003
5004 def save(self, file=None, filename=None):
~\Anaconda3\envs\python-cvcourse\lib\site-packages\wand\resource.py in raise_exception(self, stacklevel)
220 warnings.warn(e, stacklevel=stacklevel + 1)
221 elif isinstance(e, Exception):
--> 222 raise e
223
224 def __enter__(self):
CorruptImageError: unable to read image data `C:/Users/AppData/Local/Temp/magick-40700dP2k-1ORw81R1' @ error/pnm.c/ReadPNMImage/1346
bach ground so i have a pdf Image document i named as tmpdocument which has over 2200 pages so i split them using python into individual pdf documents.Now I am trying to convert them into jpeg.
problem:
so when I am trying to convert the pdf's into jpeg some of the pages are successful and some page fa9.ils with the above error since all these pages are from same document i highly doubt this is an format issue. also I am able to open and view the image in adobe so i'm sure that page is not corrupted.
Lastly Image magic takes so much disk space and then this issue I am truly lost is there any other way to achieve the above scenerio any inputs would be helpful.
Thanks.
Updated
Thanks for the reply. Yes I am using ghostscript 9.26. The pdf is kinda sensitive data so I cant post online unfortunately. temp folder is 18mb so i think that is okay.
I have found some code online it is generating the jpeg files but replacing them rather than creating new files i have never done any subprocess before and there is no visibility in this code if program is running or failed or how to kill it any inputs here also appreciated.
I understand it is not using image magick anymore still I am okay as long as i can generate jpeg.
import os, subprocess
pdf_dir = r"C:\\Users\Downloads\latest_python\python computer vison\Computer-Vision-with-Python\pdf_to_convert"
os.chdir(pdf_dir)
pdftoppm_path = r"C:\Program Files\poppler-0.68.0_x86\poppler-0.68.0\bin\pdftoppm.exe"
i = 1
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith(".pdf"):
subprocess.Popen('"%s" -jpeg %s out' % (pdftoppm_path, pdf_file))
i+=1