2

I am merging one PDf to Other to other PDF, it is working fine, but Bookmark is missing in final PDF.

Following is PDF generation code:

#- Create One Page PDF with some text
from reportlab.pdfgen import canvas as canx
c = canx.Canvas('transparent.pdf')
c.setStrokeColor((1, 0, 0))
transparentwhite = canx.Color(255, 255, 255, alpha = 0.0)
c.setFillColor(transparentwhite)
t = c.beginText()
t.setTextRenderMode(2)
c._code.append(t.getCode())
c.setFont('Helvetica', 48)
c.saveState()
c.translate(100, 100)
c.rotate(45)
c.drawCentredString(500, 100, 'DRAFT')
c.save()

Following is merge code:

#- Merge PDF.
transparentbannerfile = open('transparent.pdf', 'rb')
testpagefile = open('NOID_body20160202T133650.pdf', 'rb')
outputfile = open('mergedtransparent.pdf', 'wb')
import PyPDF2 as pdf
readerbanner = pdf.PdfFileReader(transparentbannerfile)
readertestpages = pdf.PdfFileReader(testpagefile)
bannerpage = readerbanner.getPage(0)
writeroutput = pdf.PdfFileWriter()
for x in xrange(readertestpages.getNumPages()):
  pagex = readertestpages.getPage(x)
  pagex.mergePage(bannerpage)
  writeroutput.addPage(pagex)

writeroutput.write(outputfile)
outputfile.close()
transparentbannerfile.close()
testpagefile.close()

but bookmark is missing and also title is missing in metadata.

what are 'bookmarkHorizontal', 'bookmarkHorizontalAbsolute', 'bookmarkPage' method of canvas object?

same question here also How to Add bookmarks to PDF file?

Community
  • 1
  • 1
Vivek Sable
  • 9,938
  • 3
  • 40
  • 56

2 Answers2

0

I can get Title from following code:

from pyPdf import PdfFileReader
pdf_toread = PdfFileReader(open('NOID_body20160202T133650.pdf', "rb"))
pdf_info = pdf_toread.getDocumentInfo()
print pdf_info

set Title to New PDF by setTitle method

#- Create One Page PDF with some text
from reportlab.pdfgen import canvas as canx
c = canx.Canvas('transparent.pdf')
c.setTitle("Test to set Title")

Or:

import PyPDF2 as pdf
writeroutput = pdf.PdfFileWriter()
writeroutput.addMetadata({'/Title': u'Print Page Sizes'})

BookMark

  1. First get bookmarks from input pdf these code How to get bookmark's page number

  2. following code Add Bookmark to new PDF:

`

transparentbannerfile = open('transparent.pdf', 'rb')
testpagefile = open('NOID_body20160202T133650.pdf', 'rb')
outputfile = open('mergedtransparent112.pdf', 'wb')
import PyPDF2 as pdf
from collections import OrderedDict
readerbanner = pdf.PdfFileReader(transparentbannerfile)
readertestpages = pdf.PdfFileReader(testpagefile)
bannerpage = readerbanner.getPage(0)
writeroutput = pdf.PdfFileWriter()
for x in xrange(readertestpages.getNumPages()):
    pagex = readertestpages.getPage(x)
    pagex.mergePage(bannerpage)
    writeroutput.addPage(pagex)

a = OrderedDict([(u'SIDDHARTHA', {'top': 750, 'left': 0, 'page': 1, 'title': u'SIDDHARTHA'}), (u'Chapter 01', {'top': 750, 'left': 0, 'page': 3, 'title': u'Chapter 01'}), (u'Chapter 02', {'top': 503, 'left': 0, 'page': 6, 'title': u'Chapter 02'}), (u'Chapter 03', {'top': 340, 'left': 0, 'page': 11, 'title': u'Chapter 03'}), (u'Chapter 04', {'top': 231, 'left': 0, 'page': 17, 'title': u'Chapter 04'}), (u'Chapter 05', {'top': 909, 'left': 0, 'page': 30, 'title': u'Chapter 05'}), (u'Chapter 06', {'top': 614, 'left': 0, 'page': 32, 'title': u'Chapter 06'}), (u'Chapter 07', {'top': 417, 'left': 0, 'page': 35, 'title': u'Chapter 07'}), (u'Chapter 08', {'top': 289, 'left': 0, 'page': 41, 'title': u'Chapter 08'})])
for i in a:
    writeroutput.addBookmark(i, a[i]["page"]-1, a[i])

writeroutput.write(outputfile)
outputfile.close()
transparentbannerfile.close()
testpagefile.close()

`

Now How to handle Nested Bookmarks :)??

Community
  • 1
  • 1
Vivek Sable
  • 9,938
  • 3
  • 40
  • 56
0

I had a bit of a unique use case. After using pdfrw to perform some bulk link modifications, I noticed that pdfrw does not copy bookmarks/outlines, which presented a wee bit of an issue considering I was working with a 552 page document. The original document and the one with the modified links were otherwise identical.

I was looking for something that could take the bookmark/outline input from the original document, the page contents (links/annotations appear to be inherent to each page) of the link modified document, and create a new document with the link modified pages, and the bookmarks/outline content from the original document, and maintain the highly nested bookmark/outline structure, instead of flattening it, like a lot of solutions seem to do. I was looking for a unified solution without having to use external tools to export/import bookmarks or having to install any additional packages. It's entirely possible that I missed something somewhere, but if you're reading this post, then maybe not.

I started off with the bookmarks section of @vivek-sable's post, which led me to @vjayky's post, designed to extract a bookmarks dict. It's possible that I missed something, but it looked like that code was returning a flat list instead of a nested one, and introduced what felt like an unnecessary step.

requirements: pip install PyPDF2 or pip3 install PyPDF2

The function requires three string paths as input; bookmark donor, page donor, and the new file.

Steps:

  1. Copy pages from page donor to new PDF.
  2. Extract page map from bookmark donor (using @vjayky's code).
  3. Get outlines from bookmark donor.
  4. Recursively iterate over nested donor outlines, creating nested outlines in new document.
  5. Dump writer output to file.
  6. Cleanup open files.
  7. Something, something...dark side...
from PyPDF2 import PdfFileReader as PyPDFReader
from PyPDF2 import PdfFileWriter as PyPDFWriter
from PyPDF2.generic import Destination
from pathlib import Path as p

def copy_bookmarks(inBookmarks:str, inPages:str, outputFile:str):

    print('Opening bookmarks PDF.')
    inBookmarksBytes = p(inBookmarks).open('rb')
    print('Opening pages PDF.')
    inPagesBytes = p(inPages).open('rb')
    print('Opening output PDF.')
    outputFileBytes = p(outputFile).open('wb')

    print('Reading bookmarks PDF.')
    inBookmarksReader = PyPDFReader(inBookmarksBytes)
    print('Reading pages PDF.')
    inPagesReader = PyPDFReader(inPagesBytes)
    print('Initializing PDF writer.')
    outPDFWriter = PyPDFWriter()

    print('Copying input pages to writer.')
    for page_number in range(inPagesReader.getNumPages()):
        page = inPagesReader.getPage(page_number)
        outPDFWriter.addPage(page)

    def get_page_map(bookmarkPDF, pages=None, result=None, number_pages=None):

        if result is None:
            result = {}
        if pages is None:
            number_pages = []
            pages = bookmarkPDF.trailer["/Root"].getObject()["/Pages"].getObject()
        t = pages["/Type"]
        if t == "/Pages":
            for page in pages["/Kids"]:
                result[page.idnum] = len(number_pages)
                get_page_map(bookmarkPDF, page.getObject(), result, number_pages)
        elif t == "/Page":
            number_pages.append(1)
        return result

    def transfer_bookmarks(outPDFWriter, outlines, page_map=None, parent=None):

        for outline in outlines:
            if isinstance(outline, Destination):
                outdict = {
                            'title': outline['/Title'],
                            'top': outline['/Top'],
                            'left': outline['/Left'],
                            'page': page_map[outline.page.idnum]+1
                            }
                if parent:
                    _parent = outPDFWriter.addBookmark(
                                                    title=outdict['title'],
                                                    pagenum=outdict['page']-1,
                                                    parent=parent
                                                    )
                else:
                    _parent = outPDFWriter.addBookmark(
                                                    title=outdict['title'],
                                                    pagenum=outdict['page']-1
                                                    )
            elif isinstance(outline, list):
                outPDFWriter = transfer_bookmarks(outPDFWriter, outline, page_map, _parent)

        return outPDFWriter

    page_map = get_page_map(inBookmarksReader)
    outlines = inBookmarksReader.getOutlines()

    print('Copying bookmarks to writer.')
    outPDFWriter = transfer_bookmarks(outPDFWriter, outlines, page_map)

    print('Saving PDF writer output.')
    outPDFWriter.write(outputFileBytes)

    print('Closing output PDF.')
    outputFileBytes.close()

    print('Closing bookmarks PDF.')
    inBookmarksBytes.close()

    print('Closing pages PDF.')
    inPagesBytes.close()

bookmarks = 'path_to_bookmark_donor_PDF.pdf'
pages = 'path_to_page_donor_PDF.pdf'
out = 'path_to_frankenstein_monster_PDF.pdf'

copy_bookmarks(bookmarks, pages, out)

Output should contain properly nested bookmark structure:

BookmarkScreenshot

shr00mie
  • 31
  • 3