1

I was looking into PyPDF2 in order to read bookmarks off a pdf.

Can anyone point me in the right direction as to how to read bookmarks off a pdf and then split the pdf base on it. I am pretty sure I can figure how to split once I know how to identify the bookmarks.

Thanks

Irwene
  • 2,807
  • 23
  • 48
Christian
  • 25
  • 1
  • 3

2 Answers2

1

It took me quite a while to figure this out, so I put my answer here as it may help others.

The outlines contains a nested list of Destinations (Definition of Destination Class)

And you can get the pdf outline using:

from PyPDF2 import PdfFileReader

reader = PdfFileReader(pdf)
reader.outlines

For each heading with child headings, the parent heading is in a Destination object followed by a list of child headings as a list of Destination objects.

parent_destination
[child_destination1, child_destination2, ......]

If it has no child headings then it will be followed by a sibling Destination, rather than a list.

destination1
destination2

Each Destination contains

  • title: the text content of a heading
  • page: page number
  • other properties

which can be used to split the pdf.

Hope this helps.

Ahaha
  • 416
  • 1
  • 7
  • 14
0

It looks like PyPDF2 has the functionality you need. You might find what you need this post

Community
  • 1
  • 1
Leighner
  • 193
  • 1
  • 11