3

Currently I am using the PyPDF 2 and i also tried PyPDF 4 also as a dependency.

I have encountered some encrypted files and handled them as you normally would (in the following code):

import PyPDF2
import PyPDF4 
pdfFileObj = open(r'path', 'rb') 

# creating a pdf reader object (Working until here)
pdfReader = PyPDF4.PdfFileReader(pdfFileObj)

 
# printing number of pages in pdf file (From here its not working)
print(pdfReader.numPages) 

# creating a page object 
pageObj = pdfReader.getPage(1) 

# extracting text from page 
print(pageObj.extractText()) 
  
# closing the pdf file object 
pdfFileObj.close() 

This gives the error:

PdfReadError: File has not been decrypted

I could call the pdf file into pdfFileObj variable. But when it hits print PDF.getNumPages(), it still raises the error, "PyPDF2.utils.PdfReadError: File has not been decrypted".

How do I get rid of this error? I can open the PDF file just fine by double click (which default-opens with Adobe Reader).

Abby
  • 31
  • 3

2 Answers2

0

I've seen the same issue. The conclusion I came to was that PyPdf2 can't be trusted! I would recommend trying an alternative if you can. You might like pikepdf, which is written on top of QPdf (a well known c++ library): https://pikepdf.readthedocs.io/en/latest/

PirateNinjas
  • 1,908
  • 1
  • 16
  • 21
0

One can use 'QPDF' library along with the subprocess to decrypt the given pdf file.

sudo apt-get install -y qpdf

This code snippet decrypts the pdf file(file_path) and saves it into the new_file_path.

new_file_path = file_path.replace('.pdf','_decrypt.pdf').replace('.PDF','_decrypt.pdf')
cmd = "qpdf --decrypt " + file_path + " " + new_file_path
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
                        shell=True, preexec_fn=os.setsid)
stdout, stderr = proc.communicate()
Aditya
  • 66
  • 6