1

I am working on a utility to replace images in a PDF with smaller, monochrome (2-color B&W) versions for the purpose of shrinking scanned PDFs. The program below (which is the whole thing) currently exports all images to large .png files to the in directory, whereupon the user takes these files, does any necessary image manipulations, and copies the results, with the same names, but now with the .jb2 extension, to the out directory. Running this program again should copy the modified files back into the stream, replacing the original images.

Needless to say, it doesn't work. The stream headers are all correct, but I don't think the stream is properly compressed to conform to JBIG2DEOCDE format, so none of the modified images show up in a reader. Since I'm replacing an existing stream, I can't use document.add(Image), so I have to do all this stream stuff manually. I may be missing an iText facility for doing this, but how am I supposed to get these images into the stream?

The usage of the .jb2 format was dictated by iText, but I can just as easily use a more common format like .gif. The important part is that I want an image with a B&W 2-color palette to be placed in the PDF, and with a compression format suitable for monochrome text images (I'd prefer JBIG2, but CCITT 3 or 4 or RLE will work for me too). The goal is maximum space saving; I have no processing time requirements.

Alternatively, if anyone knows any good utility programs to do what I'm trying to do, that would be just as well. I want to replace all the existing images in a PDF file with alternates (they need to be made available to be processed by an external application), and I need control over how the replacements are being compressed. It also has to be done in a manner suitable for batch mode processing, because I'm dealing with PDFs with hundreds of pages and one image per page, generally. I'm trying to reduce the size of my PDFs, but I need complete control over the compression, and I want to do all lossy compression myself. Acrobat's Reduce Size PDF function always mangles my images.

public class Test {
    public static void main(String[] args) throws IOException, DocumentException
    {
        PdfReader pdf = new PdfReader("data\\in.pdf");
        int n = pdf.getXrefSize();
        for (int i = 0; i < n; i++) {
            PdfObject object = pdf.getPdfObject(i);
            if (object == null || !object.isStream()) continue;
            PRStream stream = (PRStream)object;
            if (!stream.contains(PdfName.WIDTH)) continue;
            PdfImageObject image = new PdfImageObject(stream);
            BufferedImage bi = image.getBufferedImage();
            if (bi == null) continue;
            File in = new File("data\\in\\" + i + ".png");
            if (!in.exists()) {
                ImageIO.write(bi, "png", in);
            }
            File out = new File("data\\out\\" + i + ".jb2");
            if (!out.exists()) continue;
            Image img = Image.getInstance("data\\out\\" + i + ".jb2");
            byte[] data = new byte[(int)out.length()];
            new FileInputStream(out).read(data);
            stream.clear();
            stream.setData(data, false, PRStream.NO_COMPRESSION);
            stream.put(PdfName.TYPE, PdfName.XOBJECT);
            stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
            stream.put(PdfName.FILTER, PdfName.JBIG2DECODE);
            stream.put(PdfName.WIDTH, new PdfNumber((int)img.getWidth()));
            stream.put(PdfName.HEIGHT, new PdfNumber((int)img.getHeight()));
            stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(1));
            stream.put(PdfName.COLORSPACE, PdfName.DEVICEGRAY);
        }
        new PdfStamper(pdf, new FileOutputStream("data\\out.pdf")).close();
    }
}
Mario Carneiro
  • 1,548
  • 1
  • 18
  • 32

1 Answers1

0

I've written a library on codeplex that may help you out.

It's used for OCRing and compressing scanned PDFs with jbig2 and has a delegate to do some processing on the image before it's added to the pdf.

Alexis Pigeon
  • 7,423
  • 11
  • 39
  • 44
pwizzle
  • 11
  • 1