pypdf for lists of pdfs

Question

I have got pypdf to work just fine for a single pdf file, but I can not seem to get it to work for a lits of files, or in a for loop for multiple pdfs, without failing because of the string not being callable. Any ideas I can use as a work around?

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    return content

#print getPDFContent(r"Z:\GIS\MasterPermits\12300983.pdf").encode("ascii", "ignore")


#find pdfs            
for root, dirs, files in os.walk(folder1):
    for file in files:
      if file.endswith(('.pdf')):
          d=os.path.join(root, file)
          print getPDFContent(d).encode("ascii", "ignore")

Traceback (most recent call last):
  File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 50, in <module>
    print getPDFContent(d).encode("ascii", "ignore")
  File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 32, in getPDFContent
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
TypeError: 'str' object is not callable

I was using a list, but I got the exact same error, I didnt think this would be a big deal, but as of right now it is becoming one. I know I was able to work around similar issues in arcpy, but this is nothing close

It would help if you provided a complete program. Please reduce your program to the shortest possible complete, runnable program that demonstrates the problem and paste it into your question. See http://SSCCE.org for more info about this debugging technique. — Robᵩ, Jul 23 '13 at 19:11
At the moment that you call `file(path, "rb")`, I suspect that `file` doesn't mean what you think it means. Try adding `print type(file), file` immediately before the failing call. Do you use the variable name `file` anywhere else in your program? — Robᵩ, Jul 23 '13 at 19:12

score 2 · Answer 1 · answered Jul 23 '13 at 19:15

2

Try not to use built-in types for variable names:

Don't do this:

for file in files:

Do this instead:

 for myfile in files:

answered Jul 23 '13 at 19:15

Robᵩ

163,533
20
239
308

pypdf for lists of pdfs

1 Answers1