0

https://www.fda.gov/downloads/AboutFDA/ReportsManualsForms/Forms/UCM074728.pdf

I'm trying to read this pdf using PyPDF2 or Pdfminer, but it is saying that the File has not been decrypted in Pypdf2 and in pdfminer, it is saying that it can decompress that pdf. Somebody let me know how to do this in a python3 windows environment. I can't use poppler as I cant install poppler in this windows.

user222213
  • 111
  • 1
  • 2
  • 12

1 Answers1

3

This is a restricted PDF file. In most cases you can decrypt a file that doesn't prompt you for a password using PyPDF2 with an empty string:

from PyPDF2 import PdfFileReader

reader = PdfFileReader('sample.pdf')
reader.decrypt('')

Unfortunately, it's not the case of your file or any other with 128-bit AES encryption level which is unsupported for the PyPDF2 decrypt() method that will return a NotImplementedError.

As a simple workaround you can save this file as a new file in Adobe Reader or similar and the new file should work for your code.

Also, you can do it programmatically using qpdfas discussed in this GitHub issue:

import os, shutil, tempdir
from subprocess import check_call

    try:
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])
        shutil.move(temp_out, filename)
        print 'File Decrypted'

    finally:
        shutil.rmtree(tempdir)
xilopaint
  • 699
  • 1
  • 7
  • 16
  • Worked like a charm so thank you. First i decrypted using qpdf then i got all the fields in the pdf. it was amazing. Why dont we implement this feature in the PyPDF2 – user222213 Apr 15 '18 at 04:17
  • Is there any way we can identify in pdf wether there are urls, bookmarks, annotations and comments using PyPDF2. @xilopaint – user222213 Apr 15 '18 at 04:24
  • Hi, getting a file not found error in `check_call`. :( – Sid Jun 08 '18 at 13:14