0

I'm trying to convert an image file to text using tess4j maven dependency.
Dependency in pom.xml:-

<!-- OCR dependency -->
    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>3.4.0</version>
        <exclusions>
            <exclusion>
                <groupId>net.java.dev.jna</groupId>
                <artifactId>jna</artifactId>
            </exclusion>
            <exclusion>
                <groupId>net.sourceforge.lept4j</groupId>
                <artifactId>lept4j</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>net.java.dev.jna</groupId>
        <artifactId>jna</artifactId>
        <version>4.4.0</version>
    </dependency>
    <dependency>
        <groupId>net.sourceforge.lept4j</groupId>
        <artifactId>lept4j</artifactId>
        <version>1.5.0</version>
    </dependency>  

My code:-

public String convertImageToText(String imageFilePath) throws TesseractException {

    File imageFile = new File("imageFilePath");
    ITesseract iTesseract = new Tesseract();
    ImageIO.scanForPlugins();
    String result = iTesseract.doOCR(imageFile);
    System.out.println("Converted text is: "+result);
    return result;
}

However, when I try executing my program, I always encounter below exception:

Exception in thread "main" net.sourceforge.tess4j.TesseractException: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:215)
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
Caused by: java.lang.RuntimeException: Unsupported image format. May need to install JAI Image I/O package.
https://java.net/projects/jai-imageio/
at utilities.HelperMethods.convertImageToText(HelperMethods.java:218)
at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:408)
at utilities.HelperMethods.main(HelperMethods.java:250)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)  

All required dependencies like jai, lept4j etc are present in my repository. Also I have tried all the solutions suggested on this forum but I'm unable to resolve this error.
Any help would be appreciated.

Thanks
Update: Attaching the file here - Jpg file

Anuja
  • 115
  • 3
  • 13
  • 1
    And what type of image file are you trying to read? – VGR Jun 16 '17 at 13:31
  • I tried with jpg and png files. getting same error with both formats. – Anuja Jun 16 '17 at 15:38
  • Can you post one of the problematic image files to imgur, so we can try loading it with ImageIO? – VGR Jun 16 '17 at 15:43
  • Attached the jpg file in the question above. – Anuja Jun 16 '17 at 15:56
  • I am able to load that image, both with ImageIO.read, and by explicitly creating an ImageReader and obtaining an Iterator. The problem is not the image file. Try printing `imageFile.canRead()` before loading the file. – VGR Jun 16 '17 at 16:11
  • 1
    Silly me! I was passing wrong filename... I accidently hard-coded the filename with the variable name. But I'm now facing java.lang.UnsatisfiedLinkError... Trying to resolve that. Thanks for the quick response though! – Anuja Jun 16 '17 at 16:27

1 Answers1

0

It cannot determine an appropriate ImageReader for the given file format. So it's probably 1) the file format cannot be determined properly (weird file extension?) or 2) there is no image reader registered for the format you're trying use.

See ImageIO.getImageReaderByFormatName.

Fabian
  • 26
  • 2