12

I'm using pyPdf to merge several PDF files into one. This works great, but I would also need to add a table of contents/outlines/bookmarks to the PDF file that is generated.

pyPdf seems to have only read support for outlines. Reportlab would allow me to create them, but the opensource version does not support loading PDF files, so that doesn't work to add outlines to an existing file.

Is there any way I can add outlines to an existing PDF using Python, or any library that would allow that?

jphoude
  • 333
  • 1
  • 2
  • 8
  • Off the top of my head, I think there are at least non-Python solutions to this so that you could create your PDF and then run a command with some options to specify what you want for the outline. Not great, but it should probably at least let you get the job done. – Gordon Seidoh Worley May 28 '11 at 12:59
  • 1
    This may or may not work for you however try [link](http://www.florian-diesch.de/software/pdfrecycle/) pdfrecycle claims to support index and bookmark generation. – secumind Jun 02 '12 at 01:26

4 Answers4

4

https://github.com/yutayamamoto/pdfoutline I made a python library just for adding an outline to an existing PDF file.

Yuta
  • 286
  • 1
  • 7
3

It looks like pypdf can do the job. See the add_outline_item method in the documentation.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Watusimoto
  • 1,773
  • 1
  • 23
  • 38
2

We had a similar problem in WeasyPrint: cairo produces the PDF files but does not support bookmarks/outlines or hyperlinks. In the end we bit the bullet, read the PDF spec, and did it ourselves.

WeasyPrint’s pdf.py has a simple PDF parser and writer that can add/override PDF "objects" to an existing documents. It uses the PDF "update" mechanism and only append at the end of the file.

This module was made for internal use only but I’m open to refactoring it to make it easier to use in other projects.

However the parser takes a few shortcuts and can not parse all valid PDF files. It may need to be adapted if PyPDF’s output is not as nice as cairo’s. From the module’s docstring:

Rather than trying to parse any valid PDF, we make some assumptions that hold for cairo in order to simplify the code:

  • All newlines are '\n', not '\r' or '\r\n'
  • Except for number 0 (which is always free) there is no "free" object.
  • Most white space separators are made of a single 0x20 space.
  • Indirect dictionary objects do not contain '>>' at the start of a line except to mark the end of the object, followed by 'endobj'. (In other words, '>>' markers for sub-dictionaries are indented.)
  • The Page Tree is flat: all kids of the root page node are page objects, not page tree nodes.
Simon Sapin
  • 9,790
  • 3
  • 35
  • 44
0

pikepdf seems to have exactly what you need. I haven't used it myself, but I came across it it when I was researching a similar use case!

for automatically adding an entry for each file in a merged document:

In [1]: from pikepdf import Pdf, OutlineItem

# (In [2-4] showcase a related use case)

In [5]: from glob import glob

In [6]: pdf = Pdf.new()

In [7]: page_count = 0

In [8]: with pdf.open_outline() as outline:
   ...:     for file in glob('*.pdf'):
   ...:         src = Pdf.open(file)
   ...:         oi = OutlineItem(file, page_count)
   ...:         outline.root.append(oi)
   ...:         page_count += len(src.pages)
   ...:         pdf.pages.extend(src.pages)
   ...: 

In [9]: pdf.save('merged.pdf')
mcmuffin6o
  • 348
  • 1
  • 9