I need to programmatically analyze and combine several (hundreds) of PDF documents, and link the pages together in specialized ways. Each PDF includes text in each location where a link belongs, indicating what it should link to. I'm using pdfminer
to extract the location and text where the links should be; now I just need to actually create those links.
I've done some research and concluded that PyPDF2
can supposedly do this. At any rate, there's a seemingly-straightforward addLink
method that claims to get the job done. I just can't get it to work.
from PyPDF2 import PdfFileWriter
from PyPDF2.pdf import RectangleObject
out = PdfFileWriter()
out.insertBlankPage(800, 1000)
out.insertBlankPage(800, 1000)
# rect = [400, 400, 600, 600] # This doesn't seem to work either
rect = RectangleObject([400, 400, 600, 600])
out.addLink(0, 1, rect) # link from first to second page
with open(r'C:\temp\test.pdf', 'wb') as outf:
out.write(outf)
The code above produces a beautiful two-page PDF with nothing in it, at least as far as I can tell. Does anyone out there know how this might be accomplished? Or at least an indication of where I'm going wrong?
A solution doesn't have to use PyPDF2, as long as the library is freely licensed. Strictly speaking, Python isn't even a requirement, but it would be nice to fit this into my current structure without hacking another language onto it.