3

How to get byte array from Itext PDFReader.

float width = 8.5f * 72;
float height = 11f * 72;
float tolerance = 1f;

PdfReader reader = new PdfReader("source.pdf");

for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
    Rectangle cropBox = reader.getCropBox(i);
    float widthToAdd = width - cropBox.getWidth();
    float heightToAdd = height - cropBox.getHeight();
    if (Math.abs(widthToAdd) > tolerance || Math.abs(heightToAdd) > tolerance)
    {
        float[] newBoxValues = new float[] { 
            cropBox.getLeft() - widthToAdd / 2,
            cropBox.getBottom() - heightToAdd / 2,
            cropBox.getRight() + widthToAdd / 2,
            cropBox.getTop() + heightToAdd / 2
        };
        PdfArray newBox = new PdfArray(newBoxValues);

        PdfDictionary pageDict = reader.getPageN(i);
        pageDict.put(PdfName.CROPBOX, newBox);
        pageDict.put(PdfName.MEDIABOX, newBox);
    }
}

From above code I need to get byte array from reader object. How?

1) Not working, getting empty byteArray.

OutputStream out = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, out);
stamper.close();

byte byteArray[] = (((ByteArrayOutputStream)out).toByteArray()); 

2) Not working, getting java.io.IOException: Error: Header doesn't contain versioninfo

ByteArrayOutputStream outputStream = new ByteArrayOutputStream( );
    for (int i = 1; i <= reader.getNumberOfPages(); i++)
        {
            outputStream.write(reader.getPageContent(i));
        }
   PDDocument pdDocument = new PDDocument().load(outputStream.toByteArray( );)  

Is there any other way to get byte array from PDFReader.

Subbu
  • 308
  • 2
  • 4
  • 12
  • 1
    What on earth are you trying to do? What do you expect to be in the bytes? The original PDF file? A stream inside the PDF file? I'm downvoting the question (and I'll upvote it as soon as you define *byte array*), because right now the question doesn't make sense. – Bruno Lowagie Feb 06 '14 at 16:45
  • 1
    Here I may asked question is wrong. but, I have question that, how to get exact stream from PDFreader? See in my 2) code above, there i can able to get byte array, but when load into PDDocument getting exception "Header doesn't contain versioninfo". why? Please answer my question. – Subbu Feb 06 '14 at 17:50
  • This seems to refer to [this answer](http://stackoverflow.com/a/21378162/1729265). I just tested your option 1 for which you retrieve an empty byte array. My test turned out to produce an array of an appropriate size (i.e. identical to the size of a proper file produced using a `FileOutputStream` instead). Thus, the issue is to be found somewhere else in your setup. Maybe duplicate variable names in different scopes or something you apply to the byte array in the time between. Maybe some ignored exception... – mkl Feb 07 '14 at 13:13

2 Answers2

4

Let's take a the question from a different angle. It seems to me that you want to render a PDF page by page. If so, then your question is all wrong. Extracting the page content stream will not be sufficient as I already indicated: not a single renderer will be able to render such a stream because you don't pass any resources such as fonts, Form and Image XObjects,...

If you want to render separate pages from a PDF, you need to burst the document into separate single page full-blown PDF documents. These single page documents need to contain all the necessary information to render the page. This isn't memory friendly: suppose that you have a 100 KByte document of 10 pages where every page shows an 80 KByte logo, you'll end up with 10 documents that are each at least 80 KByte (times 10 makes already 800 KByte which is much more than the 10-page document where a single Image XObject is shared by the 10 pages).

You'd need to do something like this:

PdfReader reader = new PdfReader("source.pdf");
int n = reader.getNumberOfPages();
reader close();
ByteArrayOutputStream boas;
PdfStamper stamper;
for (int i = 0; i < n; ) {
    reader = new PdfReader("source.pdf");
    reader.selectPages(String.valueOf(++i));
    baos = new ByteArrayOutputStream();
    stamper = new PdfStamper(reader, baos);
    stamper.close();
    doSomethingWithBytes(baos.toByteArray);
}

In this case, baos.toByteArray() will contain the bytes of a valid PDF file. This wasn't the case in any of your attempts.

Nick Spreitzer
  • 10,242
  • 4
  • 35
  • 58
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • The question is a follow-up to [this answer](http://stackoverflow.com/a/21378162/1729265). For some reason the OP has trouble using a `ByteArrayOutputStream` instead of the `FileOutputStream` used there. I assume the reason is totally unrelated to iText but instead is based in some variable scoping, exception catching, or similar basic stuff. – mkl Feb 07 '14 at 13:17
  • Thank you for the clarification, mkl! As for @Subbu, you should really do more effort reading people's answers! – Bruno Lowagie Feb 07 '14 at 13:33
1
PdfReader reader = new PdfReader("source.pdf");
byte byteArray[] = reader.getPageContent(1); // page 1

Also have a look at this link

AmitG
  • 10,365
  • 5
  • 31
  • 52
  • 1
    Er... OK, so you get a stream containing PDF syntax. What about the XObjects? The question is all wrong, and I'm sorry to say, but your answer doesn't really make sense. – Bruno Lowagie Feb 06 '14 at 16:46
  • Hey Bruno, Thanks for correcting. I know you are the author of the same [PdfReader](http://grepcode.com/file/repo1.maven.org/maven2/com.lowagie/itext/2.1.2/com/lowagie/text/pdf/PdfReader.java#PdfReader.getPageContent%28int%29) API. I thought he wanted byte[] return type, so we have it in API. Might be he will have to explain this question again. – AmitG Feb 06 '14 at 16:53
  • There are many methods in `PdfReader` that return an array of `byte`s. However: it's unclear what Subbu needs: does he want to extract a stream from the PDF? I assume he want the complete PDF as a `byte[]`, but why would you use iText or `PdfReader` to do that? You can do that with plain old Java. – Bruno Lowagie Feb 06 '14 at 16:58