Identifying Bookmarks using Python

Question

I was looking into PyPDF2 in order to read bookmarks off a pdf.

Can anyone point me in the right direction as to how to read bookmarks off a pdf and then split the pdf base on it. I am pretty sure I can figure how to split once I know how to identify the bookmarks.

Thanks

score 1 · Answer 1 · answered Oct 03 '17 at 17:35

It took me quite a while to figure this out, so I put my answer here as it may help others.

The outlines contains a nested list of Destinations (Definition of Destination Class)

And you can get the pdf outline using:

from PyPDF2 import PdfFileReader

reader = PdfFileReader(pdf)
reader.outlines

For each heading with child headings, the parent heading is in a Destination object followed by a list of child headings as a list of Destination objects.

parent_destination
[child_destination1, child_destination2, ......]

If it has no child headings then it will be followed by a sibling Destination, rather than a list.

destination1
destination2

Each Destination contains

title: the text content of a heading
page: page number
other properties

which can be used to split the pdf.

Hope this helps.

score 0 · Answer 2 · edited May 23 '17 at 11:58

0

It looks like PyPDF2 has the functionality you need. You might find what you need this post

edited May 23 '17 at 11:58

Community

1
1

answered Aug 12 '15 at 13:47

Leighner

193
1
11

Identifying Bookmarks using Python

2 Answers2