Document type to bytes in Python

Question

I am creating a PDF in python using the borb library. Everything is great and i've found it really easy to work with.

However, I now have an object that is of type 'Document' in memory. I don't want to actually save this file anywhere. I just want to upload it to S3.

What is the best way to convert this file to bytes so that it can be uploaded to S3?

Below is the actual code. As well as some things i've tried.

async def create_invoice_pdf():
    # Create document
    pdf = Document()

    # Add page
    page = Page()
    pdf.append_page(page)

    page_layout = SingleColumnLayout(page)
    page_layout.vertical_margin = page.get_page_info().get_height() * Decimal(0.02)

    page_layout.add(    
        Image(        
        "xxx.png",        
        width=Decimal(250),        
        height=Decimal(60),    
        ))

    # Invoice information table  
    page_layout.add(_build_invoice_information())  
    
    # Empty paragraph for spacing  
    page_layout.add(Paragraph(" "))

    # Itemized description
    page_layout.add(_build_itemized_description_table())

    page_layout.add(Paragraph(" "))

    # Itemized description
    page_layout.add(_build_payment_itemized_description_table())

    upload_to_aws(pdf, "Borb/Test", INVOICE_BUCKET, "application/pdf")

Result:

TypeError: a bytes-like object is required, not 'Document'

Using

 pdf_bytes = bytearray(pdf)
 upload_to_aws(pdf, "Borb/Test", INVOICE_BUCKET, "application/pdf")

Result:

TypeError: 'Name' object cannot be interpreted as an integer

Using

s = str(pdf).encode
pdf_bytes = bytearray(s)
upload_to_aws(pdf_bytes, "Borb/Test", INVOICE_BUCKET, "application/pdf")

Result:

File is uploaded, but is corrupted and can not be opened after download

I am able to save the file locally using:

with open("file.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, pdf)

But I don't actually want to do this.

Any idea? Thanks in advance

You have an in-memory Python object. It is not a PDF. There is no way that converting it to bytes is going to work because nothing you have tried *actually creates a PDF*, except the `dumps()` method you say you don't want to use. You can't avoid that method because you need `borb`'s facilities to turn its internal data structures into PDF-style compressed PostScript. The best you can do is create an in-memory file using `io.StringIO`. You need to upload the compressed Postscript representation of your document. You can't do that without first creating it. — BoarGules, May 31 '22 at 08:52

score 2 · Answer 1 · answered May 31 '22 at 12:13

2

Managed to figure it out:

PDF.dumps can be used outside of the with open...

and then it is a simple io buffer

buffer = io.BytesIO()

PDF.dumps(buffer, pdf)
buffer.seek(0)

upload_to_aws(buffer.read(), "Borb/Test.pdf", INVOICE_BUCKET, "application/pdf")

answered May 31 '22 at 12:13

bruzza42

393
2
13

You should at least give credit to the comment that put you on the right track. – Mark Ransom May 31 '22 at 13:23
@MarkRansom it didn't help me. I reached out to the creator or Borb and he helped. – bruzza42 May 31 '22 at 15:48

Document type to bytes in Python

1 Answers1