0

I have a fitz.fitz.Page object that I wanted to save as bytes that I later want to upload to blob storage and have been unable to find out this in the fitz documentation

I do have the code to upload bytes to the blob storage though and just need help on the function store_fitz_page_as_bytes()

Mind you, I am not speaking about saving fitz.fitz.Document object as bytes.. that's already known to me (by using the save method).
I basically want to store each page in fitz.fitz.Document (each page will be of type fitz.fitz.Page) as bytes.

import fitz


def store_fitz_page_as_bytes(fitz_page: fitz.fitz.Page)->bytes:
    '''
    need help here to convert fitz_page to bytes and return it. 
    -This needs to be done without any local storage
    -fitz_page.select([0]) is not an option for me as I need to do this for all pages
    '''

def upload_bytes_to_azure_blob(pdf_bytes):
    '''
    I already know this so don't really want to derail the conversation to this subject. Just mentioning this so that the forum knows my purpose
    '''

#read a 200 page pdf as fitz.fitz.Document
fitz_doc = fitz.open("test.pdf")
print(type(fitz_doc))

#just get the first item from fitz_doc. It will be the first page from the 200 page pdf and will be of type fitz.fitz.Page
fitz_page = fitz_doc[0]
print(type(fitz_page))

#convert fitz_page from above to bytes
pdf_bytes = store_fitz_page_as_bytes(fitz_page)

#upload bytes to blob
upload_bytes_to_azure_blob(pdf_bytes) 

newbie101
  • 65
  • 7
  • @KJ Yes I managed to solve it similarly using 2 `fitz.Document` objects instead of trying to so save `fitz.page` as bytes.. `Object#1` is the full pdf and `Object #2` is initialized as empty.. Then I use `Object#2.insert_pdf()` to insert just a single from `Object#1` into `Object#2`. For saving `Object#2` as bytes, I used `Object#2.save()` .. I took guidance from https://github.com/pymupdf/PyMuPDF/issues/880 – newbie101 Jul 19 '23 at 04:51

1 Answers1

0

Want to store each page in fitz.fitz.Document (each page will be of type fitz.fitz.Page) as bytes.

You can use the below code to convert each page in the PDF document to bytes and upload them to Azure Blob Storage as separate PDF files.

Code:

import fitz
from io import BytesIO
from azure.storage.blob import BlobServiceClient

def store_fitz_page_as_bytes(fitz_page: fitz.Page) -> bytes:
    # Render the page as a PNG image
    pix = fitz_page.get_pixmap(alpha=False)
    image_data = pix.tobytes()

    # Encode the image data as bytes
    stream = BytesIO()
    stream.write(image_data)
    stream.seek(0)
    return stream.read()

def upload_bytes_to_azure_blob(pdf_bytes, blob_name):
    connection_string = "Your-connection-string"
    container_name = "test1"
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    container_client = blob_service_client.get_container_client(container_name)

    container_client.upload_blob(name=blob_name, data=pdf_bytes, overwrite=True)
fitz_doc = fitz.open("test890.pdf") #(It has 6 pages)

for page_number, page in enumerate(fitz_doc):
    pdf_bytes = store_fitz_page_as_bytes(page)
    blob_name = f"page_{page_number+1}.pdf"
    upload_bytes_to_azure_blob(pdf_bytes, blob_name)

Using the connection string and container name supplied, the upload_bytes_to_azure_blob function executes uploading the bytes to Azure Blob Storage.

The store_fitz_page_as_bytes function is used in the for loop to run over each page of the PDF document (fitz_doc) and convert each page to bytes. Then, you upload the bytes for each page to Azure Blob Storage along with a separate blob name which includes the page number.

Output:

enter image description here

Venkatesan
  • 3,748
  • 1
  • 3
  • 15