3

I'm following along in Al Sweigart's book 'Automate the Boring Stuff' and I'm at a loss with an index error I'm getting. I'm working with PyPDF2 tring to open an encrypted PDF document. I know the book is from 2015 so I went to the PyPDF2.PdfFileReader docs to see if I'm missing anything and everything seems to be the same, at least from what I can tell. So I'm not sure what's wrong here.

My Code

import PyPDF2
reader = PyPDF2.PdfFileReader('encrypted.pdf')
reader.isEncrypted  # is True
reader.pages[0]

gives:

Traceback (most recent call last):
    File "<pyshell#65>", line 1, in <module>
pdfReader.getPage(0)
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1176, in getPage
self._flatten()
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1505, in _flatten
catalog = self.trailer["/Root"].getObject()
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/generic.py",    line 516, in __getitem__
return dict.__getitem__(self, key).getObject()
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/generic.py", line 178, in getObject
return self.pdf.getObject(self).getObject()
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1617, in getObject
raise utils.PdfReadError("file has not been decrypted")
PyPDF2.utils.PdfReadError: file has not been decrypted
pdfReader.decrypt('rosebud')
1
pageObj = reader.getPage(0)
Traceback (most recent call last):
    File "<pyshell#67>", line 1, in <module>
pageObj = pdfReader.getPage(0)
    File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py",line 1177, in getPage
return self.flattenedPages[pageNumber]
IndexError: list index out of range

Before asking my question, I did some searching on Google and found this link with a "proposed fix". However, I'm to new at this to see what the fix is. I can't make heads or tails out of this.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
User67
  • 61
  • 1
  • 5

2 Answers2

3

I figured it out. The issue is caused by running 'pdfReader.getPage(0)' before you decrypt the file in the IDLE shell. If you take that line out, or start over without using that line after getting the error it will work as it should.

User67
  • 61
  • 1
  • 5
  • A complement to this answer: if you call `getPage()`, the exception happens, you call `decrypt()` and then call `getPage()` again, you will still have the problem. To solve that, you need to create `PdfFileReader` object. – brandizzi Feb 18 '20 at 21:58
3

Same error I got. I was working on console and before decrypt I used reader.getPage(0). Don't use getPage(#) / pages[#] before decrypt.

use code like below:

reader = PyPDF2.PdfFileReader("file.pdf")
# reader.pages[0]    # do not use this before decrypt
if reader.isEncrypted:
    reader.decrypt('')
reader.pages[0]
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Arpana
  • 31
  • 2