How to read this pdf form using PyPDF2 in python

Question

https://www.fda.gov/downloads/AboutFDA/ReportsManualsForms/Forms/UCM074728.pdf

I'm trying to read this pdf using PyPDF2 or Pdfminer, but it is saying that the File has not been decrypted in Pypdf2 and in pdfminer, it is saying that it can decompress that pdf. Somebody let me know how to do this in a python3 windows environment. I can't use poppler as I cant install poppler in this windows.

I suggest removing the URL from the question-title; it's sufficient to include it in the question's body-text. — Jeremy Friesner, Apr 14 '18 at 02:08

xilopaint · Accepted Answer · 2018-04-14T10:06:23.997

This is a restricted PDF file. In most cases you can decrypt a file that doesn't prompt you for a password using PyPDF2 with an empty string:

from PyPDF2 import PdfFileReader

reader = PdfFileReader('sample.pdf')
reader.decrypt('')

Unfortunately, it's not the case of your file or any other with 128-bit AES encryption level which is unsupported for the PyPDF2 decrypt() method that will return a NotImplementedError.

As a simple workaround you can save this file as a new file in Adobe Reader or similar and the new file should work for your code.

Also, you can do it programmatically using qpdfas discussed in this GitHub issue:

import os, shutil, tempdir
from subprocess import check_call

    try:
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])
        shutil.move(temp_out, filename)
        print 'File Decrypted'

    finally:
        shutil.rmtree(tempdir)

Worked like a charm so thank you. First i decrypted using qpdf then i got all the fields in the pdf. it was amazing. Why dont we implement this feature in the PyPDF2 — user222213, Apr 15 '18 at 04:17
Is there any way we can identify in pdf wether there are urls, bookmarks, annotations and comments using PyPDF2. @xilopaint — user222213, Apr 15 '18 at 04:24

How to read this pdf form using PyPDF2 in python

1 Answers1

Linked