How can I read the bookmarks from a pdf in python?

Question

I have been searching high and low, pyPdf, pyPdf2, pdfminer and the like, all I want is to read the bookmarks of a pdf, and their correlating pages, so that if the bookmark "chapter 1" is on page 5, and I write print(bookmarks) it will print "chapter 1, 5" or in that direction, any ideas? thanks!

Does this help you? http://stackoverflow.com/questions/8329748/how-to-get-bookmarks-page-number (first hit on Google...) — Jongware, Dec 30 '13 at 01:05
thanks, but as pointed out there, that doesn't give page numbers, just objects, for instance "[{'/Title': '1.\tPreface: Education transformed', '/Left': 88, '/Type': '/XYZ', '/Top': 477.60000, '/Zoom': , '/Page': IndirectObject(17, 0)}] but not the page number — user3084455, Dec 30 '13 at 07:57

score 2 · Answer 1 · answered Dec 30 '13 at 13:11

2

You could use the cpdf command line tool, and then parse the results:

cpdf -list-bookmarks file.pdf

will produce something like

0 "Purpose" 1 
0 "To help students visually organize similarities and differences between three ideas, objects, or sets." 1 
0 "To increase awareness of relationships between ideas, objects, or sets" 2 open
1 "Teacher Instructions" 3

where the columns are level in tree, text of bookmark, page number it points to.

answered Dec 30 '13 at 13:11

johnwhitington

2,308
1
16
18

looks promising, how do I use this tool with python? can you show a working script? – user3084455 Jan 01 '14 at 13:17

How can I read the bookmarks from a pdf in python?

1 Answers1