Is it possible to input pdf bytes straight into PyPDF2 instead of making a PDF file first

Question

I am using Linux; printing raw to port 9100 returns a "bytes" type. I was wondering if it is possible to go from this straight into PyPDF2, rather than make a pdf file first and using method PdfFileReader?

Thank you for your time.

Make a stream object out of your bytes with [`io`](https://docs.python.org/3/library/io.html#io.BytesIO) and pass the stream to `PyPDF2.PdfFileReader`. Essentially: `import io, PyPDF2; PyPDF2.PdfFileReader(io.BytesIO(b"your pdf bytes"))`. — Abdou, Dec 13 '17 at 20:50

score 9 · Accepted Answer · edited Dec 13 '17 at 23:30

9

PyPDF2.PdfFileReader() defines its first parameter as:

stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

So you can pass any data to it as long as it can be accessed as a file-like stream. A perfect candidate for that is io.BytesIO(). Write your received raw bytes to it, then seek back to 0, pass the object to PyPDF2.PdfFileReader() and you're done.

edited Dec 13 '17 at 23:30

Jongware

22,200
8
54
100

answered Dec 13 '17 at 20:51

zwer

24,943
3
48
66

Can you provide an example? – alias51 Oct 12 '21 at 18:16
@alias51 here is example `p = io.BytesIO(content); pdf = PyPDF2.PdfFileReader(p)` where content is some byte representation of a PDF file. IE the output of img2pdf.convert(jpegimag) for example. – Adam Hughes Feb 16 '22 at 18:10
No clue why this took me so long to figure out, but +1...this saved the rest of my day. I had an instance to where I had multiple writer objects that needed to be combined to show a new PDF with all writer values added. – ViaTech Jul 13 '23 at 15:58

score 3 · Answer 2 · answered Jan 14 '22 at 10:44

Yeah, first comment right. Here is code-example for generate pdf-bytes without creating pdf-file:

import io
from typing import List

from PyPDF2 import PdfFileReader, PdfFileWriter


def join_pdf(pdf_chunks: List[bytes]) -> bytes:
    # Create empty pdf-writer object for adding all pages here
    result_pdf = PdfFileWriter()
    
    # Iterate for all pdf-bytes
    for chunk in pdf_chunks:
        # Read bytes
        chunk_pdf = PdfFileReader(
            stream=io.BytesIO(      # Create steam object
                initial_bytes=chunk
            )
        )
        # Add all pages to our result
        for page in range(chunk_pdf.getNumPages()):
            result_pdf.addPage(chunk_pdf.getPage(page))
    
    # Writes all bytes to bytes-stream
    response_bytes_stream = io.BytesIO()
    result_pdf.write(response_bytes_stream)
    return response_bytes_stream.getvalue()

Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem! — Simas Joneliunas, Jan 18 '22 at 06:22
Here we collect one large Pdf-Object from array of encoded PDF files and give it in byte form without creating the file itself using io.BytesIO — Block2busted, Jan 18 '22 at 07:15

score 1 · Answer 3 · answered May 10 '22 at 16:28

1

A few years later, I've added this to the PyPDF2 docs:

from io import BytesIO

# Prepare example
with open("example.pdf", "rb") as fh:
    bytes_stream = BytesIO(fh.read())

# Read from bytes_stream
reader = PdfFileReader(bytes_stream)

# Write to bytes_stream
writer = PdfFileWriter()
with BytesIO() as bytes_stream:
    writer.write(bytes_stream)

answered May 10 '22 at 16:28

Martin Thoma

124,992
159
614
958

Is it possible to assign the bytes of page.get_contents().get_data() to another pdf file? – Life after Guest May 05 '23 at 13:49

Is it possible to input pdf bytes straight into PyPDF2 instead of making a PDF file first

3 Answers3

Linked