1

Im having a problem with image compression. I used the answer described in this question compress pdf with large images via java if i set the FACTOR variable to 0.9f or 1f (original size) the resulting pdf file starts to get bigger than the ORIGINAL. But that is not the case for all files. Some files created by myself are getting smaller like planned but some just get bigger like +1/3rd and i get black backgrounds on some images ontop of it. this is getting even worse when im using the normal image compression without resizing the image This is my test file.

Lowagies method: (resize the images)

    // TODO Auto-generated method stub
    PdfName key = new PdfName("ITXT_SpecialId");
    PdfName value = new PdfName("123456789");
    // Read the file
    PdfReader reader = new PdfReader(args[0]);
    int n = reader.getXrefSize();
    PdfObject object;
    PRStream stream;
    // Look for image and manipulate image stream
    for (int i = 0; i < n; i++) {
        object = reader.getPdfObject(i);
        if (object == null || !object.isStream())
            continue;
        stream = (PRStream)object;
       // if (value.equals(stream.get(key))) {
        PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);
        System.out.println(stream.type());
        if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {
            PdfImageObject image = new PdfImageObject(stream);
            BufferedImage bi = image.getBufferedImage();
            if (bi == null) continue;
            int width = (int)(bi.getWidth() * 1f);
            int height = (int)(bi.getHeight() * 1f);
            BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
            AffineTransform at = AffineTransform.getScaleInstance(1f, 1f);
            Graphics2D g = img.createGraphics();
            g.drawRenderedImage(bi, at);
            ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
            ImageIO.write(img, "JPG", imgBytes);
            stream.clear();
            stream.setData(imgBytes.toByteArray(), false, PRStream.BEST_COMPRESSION);
            stream.put(PdfName.TYPE, PdfName.XOBJECT);
            stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
            stream.put(key, value);
            stream.put(PdfName.FILTER, PdfName.DCTDECODE);
            stream.put(PdfName.WIDTH, new PdfNumber(width));
            stream.put(PdfName.HEIGHT, new PdfNumber(height));
            stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
            stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
        }
    }
    // Save altered PDF
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("/Applications/XAMPP/xamppfiles/htdocs/pdf_compress/download/"+args[2]));
    stamper.close();
    reader.close();

My method (Using real compression by setting the quallity of the image instead of resizing it)

        PdfReader reader = new PdfReader(args[0]);

        // Read the file
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;
        // Look for image and manipulate image stream
        for (int i = 0; i < n; i++) {
            object = reader.getPdfObject(i);

            if (object == null || !object.isStream())
                continue;
            stream = (PRStream)object;


            PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);
            if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {


                System.out.println(pdfsubtype.length());
                PdfImageObject image = new PdfImageObject(stream);

                BufferedImage bi = image.getBufferedImage();


                if (bi == null) continue;
                int width = (int)(bi.getWidth());
                int height = (int)(bi.getHeight());


                if(width <=30 || height <=30){
                    continue;

                }
                BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
                AffineTransform at = null;
                Graphics2D g = img.createGraphics();
                g.drawRenderedImage(bi, at );
                ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
                Iterator iter = ImageIO.getImageWritersByFormatName("JPG");
                ImageWriter writer = (ImageWriter)iter.next();
                ImageWriteParam iwp = writer.getDefaultWriteParam();
                iwp.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
// here goes the compression
                iwp.setCompressionQuality(Float.valueOf(args[1]));
                ImageOutputStream imageos = ImageIO.createImageOutputStream(imgBytes);
                writer.setOutput(imageos);
                IIOImage images = new IIOImage(img, null, null);

                writer.write(null,images , iwp);
                imageos.close();
                writer.dispose();

                stream.clear();
                stream.setData(imgBytes.toByteArray(), false, PRStream.BEST_COMPRESSION);
                stream.put(PdfName.TYPE, PdfName.XOBJECT);
                stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
                stream.put(PdfName.FILTER, PdfName.DCTDECODE);
                stream.put(PdfName.WIDTH, new PdfNumber(width));
                stream.put(PdfName.HEIGHT, new PdfNumber(height));
                stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
                stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
            }
        }           
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("/Applications/XAMPP/xamppfiles/htdocs/pdf_compress/download/"+args[2]));
        stamper.setFullCompression();

        stamper.close();
        reader.close();
        System.out.println("Done");

What is wrong with the code? Should i use a different image compression method? Are there any others?

Community
  • 1
  • 1
Alex P
  • 11
  • 5
  • [A bad workman always blames Lowagie](http://en.wiktionary.org/wiki/a_bad_workman_always_blames_his_tools) ;-) You are converting all your images to JPEGs (That's what the DCTDECODE) is about. Surely you understand that some image types are more optimal. Why don't you start by checking the type of your images, making sure you only reduce the resolution of the JPEGs, leaving the other types of images intact? – Bruno Lowagie May 27 '15 at 14:10
  • For instance: on page 7 (or C5 if you prefer), you have a lot of small images of only a couple of pixels. You shouldn't convert those to JPEGs. They are already as small as they can be. – Bruno Lowagie May 27 '15 at 14:25
  • Exception in thread "main" javax.imageio.IIOException: Unsupported Image Type at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1043) at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:1014) at javax.imageio.ImageIO.read(ImageIO.java:1422) at javax.imageio.ImageIO.read(ImageIO.java:1326) at com.itextpdf.text.pdf.parser.PdfImageObject.getBufferedImage(PdfImageObject.java:405) at Classes.Test.main(Test.java:59) I get that at the line with image.getBufferedImage. (differend file) – Alex P May 28 '15 at 08:37
  • Its a file with CMYK jpeg colorspace – Alex P May 28 '15 at 08:55
  • That's not an iText question, is it? JPEGs are embedded in a PDF as-is. Not a single byte is changed by iText. Not when the PDF is created, not when you extract the bytes. When you look at the stack trace, you see that it is a pure Java problem: the imageio classes don't support such a PDF hence you have to find imaging classes that do. – Bruno Lowagie May 28 '15 at 09:29

1 Answers1

0

When I only replace JPEGs, I already get a lower file size. Removing the unused object also helps:

public class ReduceSize {

    public static final String SRC = "resources/pdfs/annual_report_2009.pdf";
    public static final String DEST = "results/images/annual_report_2009.pdf";
    public static final float FACTOR = 0.5f;

    public static void main(String[] args) throws DocumentException, IOException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new ReduceSize().manipulatePdf(SRC, DEST);
    }
    public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
        PdfReader reader = new PdfReader(src);
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;
        // Look for image and manipulate image stream
        for (int i = 0; i < n; i++) {
            object = reader.getPdfObject(i);
            if (object == null || !object.isStream())
                continue;
            stream = (PRStream)object;
            if (!PdfName.IMAGE.equals(stream.getAsName(PdfName.SUBTYPE)))
                continue;
            if (!PdfName.DCTDECODE.equals(stream.getAsName(PdfName.FILTER)))
                continue;
            PdfImageObject image = new PdfImageObject(stream);
            BufferedImage bi = image.getBufferedImage();
            if (bi == null)
                continue;
            int width = (int)(bi.getWidth() * FACTOR);
            int height = (int)(bi.getHeight() * FACTOR);
            if (width <= 0 || height <= 0)
                continue;
            BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
            AffineTransform at = AffineTransform.getScaleInstance(FACTOR, FACTOR);
            Graphics2D g = img.createGraphics();
            g.drawRenderedImage(bi, at);
            ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
            ImageIO.write(img, "JPG", imgBytes);
            stream.clear();
            stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
            stream.put(PdfName.TYPE, PdfName.XOBJECT);
            stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
            stream.put(PdfName.FILTER, PdfName.DCTDECODE);
            stream.put(PdfName.WIDTH, new PdfNumber(width));
            stream.put(PdfName.HEIGHT, new PdfNumber(height));
            stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
            stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
        }
        reader.removeUnusedObjects();
        // Save altered PDF
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
        stamper.setFullCompression();
        stamper.close();
        reader.close();
    }
}

This reduces the 10,510 KB file to 9,159 KB. Of course: fonts also take up quite some space.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • The fonts is this document take up 2% of the space (according to Acrobat) so there's nothing much to gain there. There's about 30% in image data and 50% in "Content Streams". Which is surprisingly high but I haven't looked in more detail what that is... – David van Driessche May 27 '15 at 15:35
  • OK, I didn't look too closely at the fonts. I did see plenty of XML when I scrolled through the document in a text editor and there are quite some "pixel" size images. – Bruno Lowagie May 27 '15 at 15:46
  • Yes, there's a bunch of XMP metadata that could be removed to gain some space too... – David van Driessche May 27 '15 at 16:22