I am using Linux; printing raw to port 9100 returns a "bytes" type. I was wondering if it is possible to go from this straight into PyPDF2, rather than make a pdf file first and using method PdfFileReader?
Thank you for your time.
I am using Linux; printing raw to port 9100 returns a "bytes" type. I was wondering if it is possible to go from this straight into PyPDF2, rather than make a pdf file first and using method PdfFileReader?
Thank you for your time.
PyPDF2.PdfFileReader()
defines its first parameter as:
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.
So you can pass any data to it as long as it can be accessed as a file-like stream. A perfect candidate for that is io.BytesIO()
. Write your received raw bytes to it, then seek back to 0
, pass the object to PyPDF2.PdfFileReader()
and you're done.
Yeah, first comment right. Here is code-example for generate pdf-bytes without creating pdf-file:
import io
from typing import List
from PyPDF2 import PdfFileReader, PdfFileWriter
def join_pdf(pdf_chunks: List[bytes]) -> bytes:
# Create empty pdf-writer object for adding all pages here
result_pdf = PdfFileWriter()
# Iterate for all pdf-bytes
for chunk in pdf_chunks:
# Read bytes
chunk_pdf = PdfFileReader(
stream=io.BytesIO( # Create steam object
initial_bytes=chunk
)
)
# Add all pages to our result
for page in range(chunk_pdf.getNumPages()):
result_pdf.addPage(chunk_pdf.getPage(page))
# Writes all bytes to bytes-stream
response_bytes_stream = io.BytesIO()
result_pdf.write(response_bytes_stream)
return response_bytes_stream.getvalue()
A few years later, I've added this to the PyPDF2 docs:
from io import BytesIO
# Prepare example
with open("example.pdf", "rb") as fh:
bytes_stream = BytesIO(fh.read())
# Read from bytes_stream
reader = PdfFileReader(bytes_stream)
# Write to bytes_stream
writer = PdfFileWriter()
with BytesIO() as bytes_stream:
writer.write(bytes_stream)