I'm trying to put together a code that will procedurally read through a file of PDFs to scrape relevant information such as part names, numbers, materials, and final treatments. The (presumably) problematic part of the code is written:
for fp in os.listdir(path):
pdfFileObj = open(fp, 'rb')
reader = PdfReader(pdfFileObj)
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()
Title, part_number, material, f_treatments = extractText(text)
printAll(Title, part_number, material, f_treatments)
pdfFileObj.close()
where path = r'C:\Users\myname\Documents\TargetFile'
It reads the first file (1.pdf) in TargetFile successfully, but will return this upon reading the second file, (2.pdf):
[Errno 2] No such file or directory: '2.pdf'
which is peculiar, given that it needs to know that 2.pdf is in the file in order to report this error message. I suspect that fp in os.listdir() is detecting this, but that the pdfFileObj = open(fp, 'rb') command isn't finding it, as the error is reported from that line.
Do you know what the issue might be based on the information I've provided?
I thought that closing the document at the end of the loop code would help but this doesn't seem to make a difference. I've never worked with 'rb' or read-binary code before, but if it seems to work for the first file I don't expect this would be an issue.