I'm using PyPDF2 to alter a PDF document (adding bookmarks). So I need to read in the entire source PDF, and write it out, keeping as much of the data intact as possible. Merely writing each page into a new PDF object may not be sufficient to preserve document metadata.
PdfFileWriter()
does have a number of methods for copying an entire file: cloneDocumentFromReader
, appendPagesFromReader
and cloneReaderDocumentRoot
. However, they all have problems.
If I use cloneDocumentFromReader
or appendPagesFromReader
, I get a valid PDF file, with the correct number of pages, but all pages are blank.
If I use cloneReaderDocumentRoot
, I get a minimal valid PDF file, but with no pages or data.
This has been asked before, but with no successful answers. Other questions have asked about Blank pages in PyPDF2, but I can't apply the answer given.
Here's my code:
def bookmark(incomingFile):
reader = PdfFileReader(incomingFile)
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
#writer.cloneDocumentFromReader(reader)
my_table_of_contents = [
('Page 1', 0),
('Page 2', 1),
('Page 3', 2)
]
# writer.addBookmark(title, pagenum, parent=None, color=None, bold=False, italic=False, fit='/Fit')
for title, pagenum in my_table_of_contents:
writer.addBookmark(title, pagenum, parent=None)
writer.setPageMode("/UseOutlines")
with open(incomingFile, "wb") as fp:
writer.write(fp)
I tend to get errors when PyPDF2 can't add a bookmark to the PdfFileWriter object, because it doesn't have any pages, or similar.