How to extract the byte [] array from a PdfDocument

Question

After a lot of researching, I still can't find a way to extract a byte[] from a PdfDocument object. How can I achieve this?

I've tried with FileInputStream, but actually I don't have the "physical path" of the PdfDocument because I'm creating one programmatically. Moreover, I'm not very familiar with the byte[].

Can someone give me a hand with this?

    PdfDocument pdfDocumentWithoutSplit = getPdfUtils().generatePdfDocumentByMedia(shippingLabel);

        for (int i = 1; i < pdfDocumentWithoutSplit.getNumberOfPages() + 1; i++) {
            final ByteArrayOutputStream pdfByteArray = new ByteArrayOutputStream();
            final PdfDocument pdfDocument = new PdfDocument(new PdfWriter(pdfByteArray));

            pdfDocument.movePage(pdfDocumentWithoutSplit.getPage(i), i);
            pdfByteArray.close();
             //now here I need to get the bytes of each pdfDocument somehow

        }

Cheers

Can you add code? Be more explicit with "I'm creating one programmatically", What is your goal?. If you are creating a PDF it means you have the text in some variable, String most probably, so you can extract the byte array from String. If you wanna extract the byte array from PdfDocument format you can create a pdf temp — Julian Solarte, Mar 25 '19 at 14:22
I actually achieved splitting the pages of a psyhical pdf into PdfDocuments (1 page , 1 PdfDocument) and now I need to get the bytes of this PdfDocuments which none of them have a psychical path. I added a snippet in the question of my code — Nexussim Lements, Mar 25 '19 at 14:41

score 2 · Answer 1 · answered Mar 25 '19 at 14:55

2

        final ByteArrayOutputStream baos = new ByteArrayOutputStream();
        final PdfDocument pdfDocument = new PdfDocument(new PdfWriter(baos ));
        pdfDocument.movePage(pdfDocumentWithoutSplit.getPage(i), i);
        pdfDocument.close();
        // should close the PdfWriter, and hence the ByteArrayOutputStream
        baos .close();
        byte[] bytes = baos .toByteArray();

Closing things will flush any buffered data in memory, and fill the ByteArrayOutputStream.

answered Mar 25 '19 at 14:55

Joop Eggen

107,315
7
83
138

The baos.toByteArray() is returning only 15 byte of a large document (impossible) , also is throwing this **[PdfReader] Error occurred while reading cross reference table. Cross reference table will be rebuilt. com.itextpdf.io.IOException: PDF startxref not found.** , any hint ? – Nexussim Lements Mar 25 '19 at 15:52
1

@Saliffanag `PdfDocument.movePage` is documented to *move page to new place **in the same document***! You are trying to use it to move a page from `pdfDocumentWithoutSplit` to `pdfDocument`. This obviously won't work. In particular some exception is likely to be thrown. Do you by chance catch and ignore exceptions? – mkl Mar 25 '19 at 17:07
@Joop Eggen thank you so much, it really helped me. – Amol Bais Jul 21 '20 at 09:25

score 0 · Answer 2 · answered Mar 25 '19 at 14:26

Everything in a PDF should be processed as a string. First you will need to search for the physical path (you can use regex or similar string handling to search for the path based on how you're generating it and what language you're using). Then search in the PDF using a PDF reader (because it's not a plain text document) for a string that looks like your byte array. Finally, you will need to convert the string to an array by extracting the data inside and using a split or array generating method. Good luck.

How to extract the byte [] array from a PdfDocument

2 Answers2