PdfFileReader
expects a seekable, open, steam. It does not load the entire file into memory, so you have to keep it open to run the methods, like getPage
. Your hypothesis that creating a reader automatically reads in the whole file is incorrect.
A with
statement operates on a context manager, such as a file. When the with
ends, the context manager's __exit__
method is called. In this case, it closes the file handle that your PdfFildReader
is trying to use to get the second page.
As you found out, the correct procedure is to read what you must from the PDF before you close the file. If, and only if, your program needs the PDF open until the very end, you can pass the file name directly to PdfFileReader
. There is no (documented) way to close the file after that though, so I would recommend your original approach:
from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf', 'rb') as file:
pdf = PdfFileReader(file)
page = pdf.getPage(1)
print(page.extractText())
# file is closed here, pdf will no longer do its job