10

When you try and nest several bookmarks with the same name, PyPDF2 does not take it into account. Below self-contained python code to test what I mean (you need at have 3 pdf files named a, b and c in the working folder to test it out)

from PyPDF2 import PdfFileReader, PdfFileMerger


def main():
    merger = PdfFileMerger()
    first_one = True
    for file in ["a.pdf", "b.pdf", "c.pdf"]:
        print("next row")
        reader = PdfFileReader(file)
        merger.append(reader)
        if first_one:
            child = merger.addBookmark(title="blabla", pagenum=1)
            first_one = False
        else:
            child = merger.addBookmark(title="blabla", pagenum=1, parent=child)

    merger.write("test.pdf")


if __name__ == "__main__":
    main()

I would expect the resulting pdf to have three levels of nested bookmarks

blabla
    blabla
        blabla

but instead I get

blabla
    blabla
    blabla

Is there any way to make sure this does not happen ?

EDIT : I have removed the pagenum variable as I want those 3 bookmarks to point to the same page.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Chapo
  • 2,563
  • 3
  • 30
  • 60

1 Answers1

5

This seems to be a bug with PdfFileMerger.addBookmark() method. There is some detail here

Below is a work-around using PdfFileWriter and its addBookmark() method. Using this I can get 3 nested bookmarks, with same name, all on the same page:

blabla
    blabla
        blabla

Code using PdfFileWriter work-around:

from PyPDF2 import PdfFileReader, PdfFileWriter


def main():
    writer = PdfFileWriter()
    pagenum = 0
    first_one = True
    for file in ["a.pdf", "b.pdf", "c.pdf"]:
        print("next row")
        reader = PdfFileReader(file)
        writer.appendPagesFromReader(reader)
        if first_one:
            child = writer.addBookmark(
                title="blabla", pagenum=pagenum, parent=None
            )
            first_one = False
        else:
            child = writer.addBookmark(
                title="blabla", pagenum=pagenum, parent=child
            )

    with open("test.pdf", "wb") as d:
        writer.write(d)


if __name__ == "__main__":
    main()

Alternatively, I had a go at modifying the PyPDF2 library to resolve this issue, although I'm not very experienced at python so may have introduced new/other issues! Have submitted a pull-request to the maintainers, but until then you could clone my fork, and install PyPDF2 from there:

git clone https://github.com/khalida/PyPDF2.git
cd PyPDF2
python setup.py sdist
sudo -H pip uninstall -y PyPDF2
sudo -H pip install dist/PyPDF2-1.26.0.tar.gz

After that you should be able to get the nesting you want from PdfFileMerger.addBookmark(). I've tested it for the case above, but haven't done any testing beyond that.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
kabdulla
  • 5,199
  • 3
  • 17
  • 30
  • Yes I intended to have them all on the same page. Hence not incrementing `pageNum`. You are confirming what I got : seems like there is no way to have an unlimited number of same name bookmarks in PyPdf2. But is that a feature of the `pdf` format or of the library ? – Chapo Mar 27 '17 at 08:33
  • Ah, my bad. I couldn't think why anyone would want 3 nested bookmarks all pointing to the same page. As far as I can tell this is a bug in the `PdfFileMerger.addBookmark()` method. More [here](https://github.com/mstamy2/PyPDF2/issues/40). I'll update my answer with a work-around. – kabdulla Mar 27 '17 at 10:09
  • from your link `The reason I'd rather use PdfFileMerger in this particular application is that PdfFileWriter seems to require all source files to remain open until the output file is written, which results in prohibitive memory usage.`. The same applies to me in that case but your solution works for my question so I'll validate it nonetheless. Thanks for your help. – Chapo Mar 29 '17 at 02:32
  • @Chapo I have gone one better and modified the PyPDF2 library to resolve this bug (although not very experienced at python so may have introduced others). I'll add this to the answer. – kabdulla Mar 30 '17 at 02:11
  • And you get the bounty ! Booyakasha. – Chapo Mar 31 '17 at 03:17