1

I am trying to extract all the images in a PDF file using PDFBox. Its working fine for the pdf containing jpeg and png images. But it is not working for OpenJPEG2000 images. I am getting the below exception: Getting the below error:

org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed

In all version of PDFBox, same exception is coming. Tried with standalone jar as well.

I included the necessary dependencies in pom.xml as well.

<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>jbig2-imageio</artifactId>
</dependency>
<!-- For legal reasons (incompatible license), these two dependencies
are to be used only in the tests and may not be distributed. -->
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-core</artifactId>
</dependency>
<dependency>
<groupId>com.github.jai-imageio</groupId>
<artifactId>jai-imageio-jpeg2000</artifactId>
</dependency>

Any help will be appreciated.

Guna Sekaran
  • 99
  • 1
  • 1
  • 11
  • 1
    Why the test scope? – Tilman Hausherr Sep 29 '20 at 16:28
  • Or in other words, does the exception also occur in tests, with those jai dependencies present? – mkl Sep 30 '20 at 07:36
  • Thanks. Removed the test scope. Its working now. I modified in the question also. – Guna Sekaran Oct 10 '20 at 04:54
  • I am facing another issue now. Exported the jar and tried with command prompt in windows. Again I am getting the same exception: org.apache.pdfbox.contentstream.PDFStreamEngine operatorException SEVERE: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed – Guna Sekaran Oct 10 '20 at 04:56
  • That's a different question. Solution: use classpath and call like this: "java -cp "pdfbox-app-2.0.21.jar;lib/*" org.apache.pdfbox.tools.PDFBox ExtractImages " . Copy the extra jar files in the lib subdirectory. – Tilman Hausherr Oct 10 '20 at 10:29
  • For command line executions, I am newbie. Can you explain more on this. It will be useful. How to use classpath? – Guna Sekaran Oct 12 '20 at 06:22
  • When I examined the pdfbox jar, both JAI core and jpeg2000 packages are available. How can I use a single jar itself to make it run? – Guna Sekaran Oct 12 '20 at 06:29
  • I thought you were asking about the pdfbox app. I don't know your project, whether this is a "fat jar" with everything or not. Usually it's not and all the jars are in a separate directory. Then the solution is to use java -cp "project.jar;lib/*" . The ";" is ":" on linux. – Tilman Hausherr Oct 12 '20 at 08:03
  • Its pdfbox2.0.2.1 only. Nothing changed in the source. Just building the project and running it. – Guna Sekaran Oct 12 '20 at 09:49
  • If it is pdfbox 2.0.21 only then use the command line string I mentioned and copy the two jar files in the lib subdirectory. It is the command I also use for myself when running from the command line. – Tilman Hausherr Oct 12 '20 at 11:03
  • Thanks. Its working. I kept the jar file in the wrong place before. – Guna Sekaran Oct 12 '20 at 16:05

1 Answers1

1

Copy the imageing related .jar files into the lib subdirectory, and then use this command line:

java -cp "pdfbox-app-2.0.21.jar;lib/*" org.apache.pdfbox.tools.PDFBox ExtractImages <parameters>

Use ";" on windows, ":" on linux.

org.apache.pdfbox.tools.PDFBox is the name of the main class.

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97