7

I am using Linux; printing raw to port 9100 returns a "bytes" type. I was wondering if it is possible to go from this straight into PyPDF2, rather than make a pdf file first and using method PdfFileReader?

Thank you for your time.

neophlegm
  • 375
  • 1
  • 13
TheSadPrinter
  • 359
  • 1
  • 4
  • 15
  • 3
    Make a stream object out of your bytes with [`io`](https://docs.python.org/3/library/io.html#io.BytesIO) and pass the stream to `PyPDF2.PdfFileReader`. Essentially: `import io, PyPDF2; PyPDF2.PdfFileReader(io.BytesIO(b"your pdf bytes"))`. – Abdou Dec 13 '17 at 20:50

3 Answers3

9

PyPDF2.PdfFileReader() defines its first parameter as:

stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

So you can pass any data to it as long as it can be accessed as a file-like stream. A perfect candidate for that is io.BytesIO(). Write your received raw bytes to it, then seek back to 0, pass the object to PyPDF2.PdfFileReader() and you're done.

Jongware
  • 22,200
  • 8
  • 54
  • 100
zwer
  • 24,943
  • 3
  • 48
  • 66
  • Can you provide an example? – alias51 Oct 12 '21 at 18:16
  • @alias51 here is example `p = io.BytesIO(content); pdf = PyPDF2.PdfFileReader(p)` where content is some byte representation of a PDF file. IE the output of img2pdf.convert(jpegimag) for example. – Adam Hughes Feb 16 '22 at 18:10
  • No clue why this took me so long to figure out, but +1...this saved the rest of my day. I had an instance to where I had multiple writer objects that needed to be combined to show a new PDF with all writer values added. – ViaTech Jul 13 '23 at 15:58
3

Yeah, first comment right. Here is code-example for generate pdf-bytes without creating pdf-file:

import io
from typing import List

from PyPDF2 import PdfFileReader, PdfFileWriter


def join_pdf(pdf_chunks: List[bytes]) -> bytes:
    # Create empty pdf-writer object for adding all pages here
    result_pdf = PdfFileWriter()
    
    # Iterate for all pdf-bytes
    for chunk in pdf_chunks:
        # Read bytes
        chunk_pdf = PdfFileReader(
            stream=io.BytesIO(      # Create steam object
                initial_bytes=chunk
            )
        )
        # Add all pages to our result
        for page in range(chunk_pdf.getNumPages()):
            result_pdf.addPage(chunk_pdf.getPage(page))
    
    # Writes all bytes to bytes-stream
    response_bytes_stream = io.BytesIO()
    result_pdf.write(response_bytes_stream)
    return response_bytes_stream.getvalue()
Block2busted
  • 41
  • 1
  • 7
  • 1
    Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem! – Simas Joneliunas Jan 18 '22 at 06:22
  • Here we collect one large Pdf-Object from array of encoded PDF files and give it in byte form without creating the file itself using io.BytesIO – Block2busted Jan 18 '22 at 07:15
1

A few years later, I've added this to the PyPDF2 docs:

from io import BytesIO

# Prepare example
with open("example.pdf", "rb") as fh:
    bytes_stream = BytesIO(fh.read())

# Read from bytes_stream
reader = PdfFileReader(bytes_stream)

# Write to bytes_stream
writer = PdfFileWriter()
with BytesIO() as bytes_stream:
    writer.write(bytes_stream)
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958