2

I have been searching high and low, pyPdf, pyPdf2, pdfminer and the like, all I want is to read the bookmarks of a pdf, and their correlating pages, so that if the bookmark "chapter 1" is on page 5, and I write print(bookmarks) it will print "chapter 1, 5" or in that direction, any ideas? thanks!

user3084455
  • 706
  • 6
  • 9
  • Does this help you? http://stackoverflow.com/questions/8329748/how-to-get-bookmarks-page-number (first hit on Google...) – Jongware Dec 30 '13 at 01:05
  • thanks, but as pointed out there, that doesn't give page numbers, just objects, for instance "[{'/Title': '1.\tPreface: Education transformed', '/Left': 88, '/Type': '/XYZ', '/Top': 477.60000, '/Zoom': , '/Page': IndirectObject(17, 0)}] but not the page number – user3084455 Dec 30 '13 at 07:57

1 Answers1

2

You could use the cpdf command line tool, and then parse the results:

cpdf -list-bookmarks file.pdf

will produce something like

0 "Purpose" 1 
0 "To help students visually organize similarities and differences between three ideas, objects, or sets." 1 
0 "To increase awareness of relationships between ideas, objects, or sets" 2 open
1 "Teacher Instructions" 3

where the columns are level in tree, text of bookmark, page number it points to.

johnwhitington
  • 2,308
  • 1
  • 16
  • 18