Compressing PDF without X-Object

Question

I have several methods to manipulate my PDF files, such as convert them to .jpg images to make the compression. Now, I have a pdf file that doesn't have an X-Object, ie, I cannot turn it into jpg to compress them. Then i decided to grab the entire pdf file and try some way to compress it, I tried using iText Stamper and pdfBox.addCompression (deprecated) but none worked so far. Follow:

    public static byte[] compressPdf(final byte[] imageBytes) {
    try (ByteArrayOutputStream out = new ByteArrayOutputStream()){

        final PdfReader reader = new PdfReader(imageBytes);
        final PdfStamper stamper = new PdfStamper(reader, out, PdfWriter.VERSION_1_7);

        stamper.getWriter().setFullCompression();
        stamper.getWriter().setCompressionLevel(9);

        int total = reader.getNumberOfPages() + 1;
        for (int i = 1; i < total; i++) {
            reader.setPageContent(i, reader.getPageContent(i));
        }

        stamper.close();
        reader.close();

        return out.toByteArray();
    } catch (Exception e) {
        e.printStackTrace();
    }

    return null;
}

Notice that stamper.fullcompression or stamper.setcompressionlevel aren't working.

Furthermore *"but none worked so far."* / *"aren't working"* aren't proper descriptions of your observations. Is some exception thrown? Or does simply no further compression occur? The reason might be that the file already is pretty well compressed. Or does something else happen? — mkl, May 09 '18 at 18:45

score 1 · Accepted Answer · answered May 28 '18 at 14:10

The PDF document you are displaying is merely a wrapper round an image.

Allow me to elaborate.

Normally, a PDF consists of instructions for a viewer. Something like:

go to coordinates 50, 50
set the font to Helvetica, size 12
draw the glyph for character 'H'
etc

These instructions are gathered into objects. And similarly, the resources they use (like images, fonts, etc) are also grouped into objects.

Each object gets assigned a number. Those are the numbers in the XREF.

When iText attempts to apply compression, it will go looking for object streams (so streams of instructions and fonts, etc) and will attempt to compress those.

Your PDF contains only 1 image.

iText will not compress your image (since that may result in loss of quality).

What you can do:

do not use scanned document, use 'real' PDF documents (your end-users will be grateful)
extract the image from your PDF (using iText), compress the image (using an image processing library), re-insert the image into the resources.

Thanks for your help Joris, I did what you said and it worked. — Marcos Silva, Jul 23 '18 at 17:09

Compressing PDF without X-Object

1 Answers1