0

I'm trying to extract text from a pdf, so first I have to convert it to image. I can do it, but just with one pdf with a specific name. If I add another pdf to the folder, or change the name of the pdf I already have, I get this error:

pdf2image.exceptions.PDFPageCountError: Unable to get page count. I/O Error: Couldn't open file 'LoremIpsun.pdf': No error.

This is the part of the code I'm having trouble with:

from pdf2image import convert_from_path 
import os 


def pdf_a_txt(route):

    target = route               
    for root, dirnames, files in os.walk(target):
        for x in files:
            if x.endswith('.pdf'):

                pages = convert_from_path(x, 500, poppler_path='C:\\Users\\User\\Desktop\\poppler-22.04.0\\Library\\bin')
                image_counter = 1
  
                
                for page in pages: 
                    filename = "page_"+str(image_counter)+".jpg"
                    page.save(root+'\\'+ filename, 'JPEG') 
                    image_counter = image_counter + 1



pdf_a_txt('C:\\Users\\User\\Desktop\\Test\\Input')

I'm using a pdf named "LoremIpsum.pdf". If I put another pdf inside the Input folder, it will just open the LoremIpsum. When it finishes to convert that one and tries to open the other one I get the error above. And if I change "LoremIpsum.pdf" for something different, like "LoremIpsun.pdf" it also can't be opened. I know is a pretty simple code, but I can't find why it's just working with that specific name.

Any help would be appreciated. Thanks!

Agusms
  • 17
  • 4

0 Answers0