0

Looking for a way to compress pdf qulity with changing DPI in JAVA.

As example I tried PDFBox/itext libraries but still couldn't achieve it. Specially I need set the DPI if the current PDF DPI is higher (I need to reduce the quality on scanned documents)

Please note that, I am looking only free and opensource libraries.

Asanka Rox
  • 104
  • 1
  • 8
  • Scanned, every page a grayscale image? Though dots-per-inch is the unit of paper publishing, for images the manageable properties are sizes in pixels and color model (number of colors/indexed colors). – Joop Eggen Apr 02 '19 at 08:24
  • The dpi is only available indirectly: https://stackoverflow.com/questions/5472711/dpi-of-image-extracted-from-pdf-with-pdfbox Replacing images with lower dpi is very tricky, there's currently a guy trying to do it on the PDFBox users mailing list (archive here: https://mail-archives.apache.org/mod_mbox/pdfbox-users/ , thread "resize inline images", starts in february, still active in march) While his project is about inline images, some parts can be reused. You need some good understanding of PDFBox. – Tilman Hausherr Apr 02 '19 at 11:14
  • Have you considered changing the compression algorithms, instead of DPI? You can often get much better compression, with sacrificing resolution/quality, by using the different compression options available in the PDF standard. Are your PDF files similar, or are you looking for a general solution? If similar, perhaps you could post one or more examples here for review. – Ryan Apr 02 '19 at 21:19
  • @JoopEggen, Actually I need reduce the DPI for color scanned PDF. Task is once user upload the PDF, I need to verify the DPI and reduce if it's higher 500DPI. – Asanka Rox Apr 03 '19 at 02:13
  • @Tilman Hausherr thanks for the information, currently I referred https://stackoverflow.com/questions/5472711/dpi-of-image-extracted-from-pdf-with-pdfbox thread to calculate the DPI of PDF – Asanka Rox Apr 03 '19 at 02:15
  • @Ryan, Actually I need reduce the DPI for color scanned PDF. Task is once user upload the PDF, I need to verify the DPI and reduce if it's higher 500DPI. But if I can control the resolution of the PDF (Scanned). then no problem. – Asanka Rox Apr 03 '19 at 02:17
  • Why is 500DPI a limit? why not higher? – Ryan Apr 03 '19 at 18:39
  • @Ryan it's user requirement – Asanka Rox Apr 04 '19 at 04:05

2 Answers2

2

Finally, I found the best solution using itextpdf Library. We can reduce the DPI based on the Factor.

eg: Factor = NewDPI/CurrentDPI (FACTOR = 0.5f)

import java.awt.Graphics2D;
import java.awt.geom.AffineTransform;
import java.awt.image.BufferedImage;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import javax.imageio.ImageIO;

import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfNumber;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.parser.PdfImageObject;

public class ReduceSize {

    public static final String SRC = "/Users/xxxx/Downloads/low/input.pdf";
    public static final String DEST = "/Users/xxxx/Downloads/low/output.pdf";
    public static final float FACTOR = 0.5f;

    public static void main(String[] args) throws DocumentException, IOException {
        File file = new File(DEST);
        file.getParentFile().mkdirs();
        new ReduceSize().manipulatePdf(SRC, DEST);
    }
    public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
        PdfReader reader = new PdfReader(src);
        int n = reader.getXrefSize();
        PdfObject object;
        PRStream stream;
        // Look for image and manipulate image stream
        for (int i = 0; i < n; i++) {
            object = reader.getPdfObject(i);
            if (object == null || !object.isStream())
                continue;
            stream = (PRStream)object;
            if (!PdfName.IMAGE.equals(stream.getAsName(PdfName.SUBTYPE)))
                continue;
            if (!PdfName.DCTDECODE.equals(stream.getAsName(PdfName.FILTER)))
                continue;
            PdfImageObject image = new PdfImageObject(stream);
            BufferedImage bi = image.getBufferedImage();
            if (bi == null)
                continue;
            int width = (int)(bi.getWidth() * FACTOR);
            int height = (int)(bi.getHeight() * FACTOR);
            if (width <= 0 || height <= 0)
                continue;
            BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
            AffineTransform at = AffineTransform.getScaleInstance(FACTOR, FACTOR);
            Graphics2D g = img.createGraphics();
            g.drawRenderedImage(bi, at);
            ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
            ImageIO.write(img, "JPG", imgBytes);
            stream.clear();
            stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
            stream.put(PdfName.TYPE, PdfName.XOBJECT);
            stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
            stream.put(PdfName.FILTER, PdfName.DCTDECODE);
            stream.put(PdfName.WIDTH, new PdfNumber(width));
            stream.put(PdfName.HEIGHT, new PdfNumber(height));
            stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
            stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
        }
        reader.removeUnusedObjects();
        // Save altered PDF
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
        stamper.setFullCompression();
        stamper.close();
        reader.close();
    }

}
Asanka Rox
  • 104
  • 1
  • 8
  • 2
    You are aware that your solution only touches very specific image types? (Your question does not indicate that your requirement covers embedded JPEGs only.) On the other hand it completely ignores the original DPI value even though your question indicated that images should be changed only if they had a high original DPI value. So all in all that code might be enough for you to click *resolved* on your task at work but in response to your question it is a very poor answer. – mkl Apr 17 '19 at 13:57
  • @mkl My solution is only for Scanned PDF, and if you read my question you can understand my requirement. (I need to reduce the quality on scanned documents) The solution does not include JPG format however the solution include temporary saving the file to JPG format. – Asanka Rox Apr 29 '19 at 07:52
  • 1
    @AsankaRox I went through the licensing of iText, just want to ask a question. Is it free to use for a website backend service? – The Coder Jan 11 '20 at 10:54
0

Please try with full compression

PdfReader reader = new PdfReader ( src) ; 
PdfStamper stamper = new PdfStamper( reader, new FileOutputStream(dest) , 
Pdfwrlter. VERSION 1_5) ; 
stamper.getWriter().setCompressionLeveI (9);
int total = reader . getNumberOfPages() + 1; 
for (int i = 1; i < total; i++) {
      reader . setpagecontent (i, reader . getpagecontent (i) ) ; 
}
stamper. setFuIICompression() ; 
stamper. close ( ) ; 
Viren
  • 1
  • 1