0

I'm trying to put together a code that will procedurally read through a file of PDFs to scrape relevant information such as part names, numbers, materials, and final treatments. The (presumably) problematic part of the code is written:

for fp in os.listdir(path):
        pdfFileObj = open(fp, 'rb')
        reader = PdfReader(pdfFileObj)
        number_of_pages = len(reader.pages)
        page = reader.pages[0]
        text = page.extract_text()
        Title, part_number, material, f_treatments = extractText(text)
        printAll(Title, part_number, material, f_treatments)
        pdfFileObj.close()

where path = r'C:\Users\myname\Documents\TargetFile'

It reads the first file (1.pdf) in TargetFile successfully, but will return this upon reading the second file, (2.pdf):

[Errno 2] No such file or directory: '2.pdf'

which is peculiar, given that it needs to know that 2.pdf is in the file in order to report this error message. I suspect that fp in os.listdir() is detecting this, but that the pdfFileObj = open(fp, 'rb') command isn't finding it, as the error is reported from that line.

Do you know what the issue might be based on the information I've provided?

I thought that closing the document at the end of the loop code would help but this doesn't seem to make a difference. I've never worked with 'rb' or read-binary code before, but if it seems to work for the first file I don't expect this would be an issue.

Tyler
  • 1
  • Update: I'm sure you will notice this, but the mistake I made was not making sure the path was specified in both the for loop initialization AND the open(fb, 'rb') line. I assumed that open() would operate in the same directory that was specified in the listdir() command, and as a result the program was searching for the contents of one folder in another. I had a copy of 1.pdf sitting in both of these, which is why the code ran successfully at first, but failed with the rest. – Tyler Nov 21 '22 at 17:48

0 Answers0