I'm trying to extract text from a pdf, so first I have to convert it to image. I can do it, but just with one pdf with a specific name. If I add another pdf to the folder, or change the name of the pdf I already have, I get this error:
pdf2image.exceptions.PDFPageCountError: Unable to get page count. I/O Error: Couldn't open file 'LoremIpsun.pdf': No error.
This is the part of the code I'm having trouble with:
from pdf2image import convert_from_path
import os
def pdf_a_txt(route):
target = route
for root, dirnames, files in os.walk(target):
for x in files:
if x.endswith('.pdf'):
pages = convert_from_path(x, 500, poppler_path='C:\\Users\\User\\Desktop\\poppler-22.04.0\\Library\\bin')
image_counter = 1
for page in pages:
filename = "page_"+str(image_counter)+".jpg"
page.save(root+'\\'+ filename, 'JPEG')
image_counter = image_counter + 1
pdf_a_txt('C:\\Users\\User\\Desktop\\Test\\Input')
I'm using a pdf named "LoremIpsum.pdf". If I put another pdf inside the Input folder, it will just open the LoremIpsum. When it finishes to convert that one and tries to open the other one I get the error above. And if I change "LoremIpsum.pdf" for something different, like "LoremIpsun.pdf" it also can't be opened. I know is a pretty simple code, but I can't find why it's just working with that specific name.
Any help would be appreciated. Thanks!