I have been searching high and low, pyPdf, pyPdf2, pdfminer and the like, all I want is to read the bookmarks of a pdf, and their correlating pages, so that if the bookmark "chapter 1" is on page 5, and I write print(bookmarks) it will print "chapter 1, 5" or in that direction, any ideas? thanks!
Asked
Active
Viewed 2,364 times
2
-
Does this help you? http://stackoverflow.com/questions/8329748/how-to-get-bookmarks-page-number (first hit on Google...) – Jongware Dec 30 '13 at 01:05
-
thanks, but as pointed out there, that doesn't give page numbers, just objects, for instance "[{'/Title': '1.\tPreface: Education transformed', '/Left': 88, '/Type': '/XYZ', '/Top': 477.60000, '/Zoom':
, '/Page': IndirectObject(17, 0)}] but not the page number – user3084455 Dec 30 '13 at 07:57
1 Answers
2
You could use the cpdf command line tool, and then parse the results:
cpdf -list-bookmarks file.pdf
will produce something like
0 "Purpose" 1
0 "To help students visually organize similarities and differences between three ideas, objects, or sets." 1
0 "To increase awareness of relationships between ideas, objects, or sets" 2 open
1 "Teacher Instructions" 3
where the columns are level in tree, text of bookmark, page number it points to.

johnwhitington
- 2,308
- 1
- 16
- 18
-
looks promising, how do I use this tool with python? can you show a working script? – user3084455 Jan 01 '14 at 13:17