4

I want to convert a PDF to a TIFF using PDFBox 2.x and the PDFRenderer Class.

But it runs very slowly compared to ghostscript.

Here's my sample code

public class SpeedTest
{
    static long startTime = System.currentTimeMillis ();

    public static void logTime (String msg)
    {
        long now = System.currentTimeMillis ();
        System.out.println (String.format ("%.3f: %s", (now - startTime) / 1000.0, msg));
        startTime = now;
    }

    public static void main (String[] args) throws Exception
    {
        //System.setProperty ("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");

        String pdfFileName = args[0];
        String tiffFileName = args[1];

        PDDocument document = PDDocument.load (new File (pdfFileName));
        logTime (pdfFileName + " loaded.");
        PDFRenderer pdfRenderer = new PDFRenderer (document);
        logTime ("intitalized renderer.");
        BufferedImage img = pdfRenderer.renderImageWithDPI (0, 600, ImageType.RGB);
        logTime ("page rendered as image.");
        ImageIO.write (img, "TIFF", new File (tiffFileName));
        logTime ("image saved as TIFF.");
    }
}

The output is as follows

0.521: sample.pdf loaded.
0.013: intitalized renderer.
2.910: page rendered as image.
2.005: image saved as TIFF.

As you can see, the call to pdfRenderer.renderImageWithDPI takes almost 3 secs (also ImageIO.write-call takes 2 secs, too).

When done the same using ghostscript the complete task finishes in 0.4secs.

time gs -dQUIET -dBATCH -dNOPAUSE -sstdout=/dev/null -sDEVICE=tifflzw -r600 -dFirstPage=1 -dLastPage=1 -sOutputFile=sample.tif sample.pdf

real    0m0.389s
user    0m0.340s
sys     0m0.048s

I've also already tried

System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider");

as I'm running Java 8 (1.8.0_161 to be precise) but that makes no difference.

Thanks for every idea, regards

Thomas

tombo_189
  • 384
  • 5
  • 22
  • There is no solution. Your code is fine. Java is usually slower than C++. The upcoming 2.0.9 version will have speed improvements for some PDF files, but it never be as fast as ghostscript. Btw you're complaining about the performance of `ImageIO.write`, that isn't even part of PDFBox. – Tilman Hausherr Mar 04 '18 at 23:05
  • I was surprised about the performance, because I've thought "doing this small task myself" would be much faster than using a big standard tool. But then I realized that something is going wrong and I'm wondering about both, PDFBox and ImageIO (I'm using the TwelveMonkeys-Extension for writing TIFF's) - and about, yes, the performance of Java in general, too. The speed difference between C++ and Java would then be 1:10! This cannot be true? Thanks for supporting and saying that programmatically everything is fine! – tombo_189 Mar 05 '18 at 07:05
  • twelvemonkeys is excellent, we are using small parts of it. If you can share the PDF I could have a look with the java profiler... but 3 seconds for a 600dpi is rather a "good" number. Some simple PDFs may be faster, but some complex PDFs (with shadings) can be much, much slower. I have some PDFs that take >30sec at 72 dpi. – Tilman Hausherr Mar 05 '18 at 07:11
  • The PDF is nothing special - images and shades were my first thought, too. Before posting this question I've tried different examples. The case above ist from just a standard letter with recipient, date and a small sample text and only one standard font and no images in the doc. Created and exported as PDF by OpenOffice. So really as simple as can be. I'll stay with ghostscript and spawn a system process to do the rendering task with gs from the java main control program. – tombo_189 Mar 05 '18 at 08:44

2 Answers2

2

Upgrade to JDK 1.8.0_191 which was released on Oct, 2018, or JDK 9.0.4.

From Pdfbox docs,

PDFBox and Java 8

Important notice when using PDFBox with Java 8 before 1.8.0_191 or Java 9 before 9.0.4

Due to the change of the java color management module towards “LittleCMS”, users can experience slow performance in color operations. A solution is to disable LittleCMS in favor of the old KCMS (Kodak Color Management System) by:

Starting with -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider or Calling

System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")

Sources:

https://bugs.openjdk.java.net/browse/JDK-8041125

Community
  • 1
  • 1
Sundararaj Govindasamy
  • 8,180
  • 5
  • 44
  • 77
0

According to my experiments this slowness only occurs for the first rendered page of a document. If you render all pages of a multi-page document, then all pages after the first one render faster. The absolute speed of the rendering also depends very much on the size of the DPIs used.

Render 6 document pages at 600 DPI
4.903s: page 0 rendered as image.
4.205s: page 1 rendered as image.
3.946s: page 2 rendered as image.
3.866s: page 3 rendered as image.
3.761s: page 4 rendered as image.
3.633s: page 5 rendered as image.

Render 6 document pages at 300 DPI
3.241s: page 0 rendered as image.
1.308s: page 1 rendered as image.
1.155s: page 2 rendered as image.
1.156s: page 3 rendered as image.
1.109s: page 4 rendered as image.
1.083s: page 5 rendered as image.

Render 6 document pages at 150 DPI
2.507s: page 0 rendered as image.
0.555s: page 1 rendered as image.
0.386s: page 2 rendered as image.
0.373s: page 3 rendered as image.
0.410s: page 4 rendered as image.
0.361s: page 5 rendered as image.

Render 6 document pages at 72 DPI
2.455s: page 0 rendered as image.
0.333s: page 1 rendered as image.
0.213s: page 2 rendered as image.
0.190s: page 3 rendered as image.
0.175s: page 4 rendered as image.
0.171s: page 5 rendered as image.

I think the problem here is that the AWT graphics does all rendering in software and with a constant pixel fill rate the rendering time scales quadratically with the DPI value. The slowness of the first image is probably some initialization overhead. (But that's all a wild guess at the moment.)

mipa
  • 10,369
  • 2
  • 16
  • 35
  • If the PDF is well designed, some resources like fonts, images, colorspaces are used in all pages so they're opened in the first one. – Tilman Hausherr Nov 18 '19 at 09:44