0

Current technology stack

  • img2pdf==0.4.4
  • pikepdf==7.1.2
  • Python 3.10
  • Ubuntu 22.04

The requirement

A pdf file (let's call it static.pdf) exists in the disk. Another pdf (let's call it dynamic.pdf) is being generated dynamically in memory with img2pdf library, depending on some user input parameters.

The task is to concatenate these two pdfs as a single one (static.pdf, then dynamic.pdf) and send it as an email attachment via the SMTP library.

Current Solution I am Employing

This is based on the pikepdf documentation.

  • Dump dynamic.pdf in the disk
  • Read static.pdf from disk with pikepdf
  • Read dynamic.pdf from disk with pikepdf
  • Concatenate them with list-like API provided by pikepdf, let's call this final.pdf.
  • Dump final.pdf on disk with pikepdf api
  • Read it from disk with open(file='final.pdf', mode='rb') as bytes
  • Attach the bytes to the email message

What I want

Remove all the unnecessary disk-I/O when I already have dynamic.pdf in memory, and the final result is needed to be attached to email as bytes (no need to persist on disk). So ideally, the only disk operation should be reading static.pdf.

But I cannot find much information on the pikepdf site about in-memory concatenation. Moreover, I am also not certain whether a pikepdf.Pdf object can expose the exact same bytes as what I would get if I dump the pdf on disk and then read it using python native open function.

So any ideas around this would be helpful, even if there are other libraries that allow this functionality. The constraints on other libraries would be

  • Plays will with my tech stack (python, Ubuntu and also needs to run on windows)
  • FOSS, and trustworthy enough
Della
  • 1,264
  • 2
  • 15
  • 32

1 Answers1

0

According to the documentation for pikepdf.Pdf, the Pdf.open and Pdf.save methods accept a file-like object instead of a filename, so you can use io.BytesIO here.

For example,

import io
import img2pdf
from pikepdf import Pdf

def pdf_from_bytes(data):
    return Pdf.open(io.BytesIO(data))

def add_png_to_end(static_pdf_path, png_file_path):
    # Adds a PNG to the end of an existing PDF document and returns the bytes.
    static_pdf = Pdf.open(static_pdf_path)
    png_pdf = pdf_from_bytes(img2pdf.convert(png_file_path))
    new_pdf = Pdf.new()
    new_pdf.pages.extend(static_pdf.pages)
    new_pdf.pages.extend(png_pdf.pages)
    res = io.BytesIO()
    new_pdf.save(res)
    return res.getvalue()
fakedad
  • 1,292
  • 1
  • 10
  • 21