0

Hi guys I am trying to run Tesseract and get the text from an image but I encounter the following error:

Exception in thread "main" java.lang.Error: Invalid memory access
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:477)
at com.sun.jna.Function.invoke(Function.java:411)
at com.sun.jna.Function.invoke(Function.java:323)
at com.sun.jna.Library$Handler.invoke(Library.java:236)
at com.sun.proxy.$Proxy0.TessBaseAPIGetUTF8Text(Unknown Source)
at net.sourceforge.tess4j.Tesseract.getOCRText(Tesseract.java:436)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:291)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:212)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:196)
at Crop_Image.main(Crop_Image.java:98)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

I am loading an image file jpg containing english text. This is how I try to load the file and then try to get the text from it:

 public static void main(String[] args){

    String result = "";

    File imageFile = new File("C:\\Users\\user\\Desktop\\Untitled.jpg");
    Tesseract instance = new Tesseract();

    try {
         result = instance.doOCR(imageFile);
         result.toString();

    } catch (Exception e) {
        e.printStackTrace();
        System.err.println(e.getMessage());
    }
}

Also I am also inside my project using Maven and here is my pom file:

<dependencies>

    <dependency>
        <groupId>nu.pattern</groupId>
        <artifactId>opencv</artifactId>
        <version>2.4.9-4</version>
    </dependency>

    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>3.1.0</version>
    </dependency>

</dependencies>

What could be the cause of this error?

user6006748
  • 303
  • 2
  • 3
  • 10

2 Answers2

2

I saw your code and there might be an issue in the way you initialize Tesseract. Now since you are using maven as nguyenq suggested you need to point exactly to the location of the library - tessdata so here is what you should do:

  public static String Image_To_Text(String image_path){

    String result = "";

    File imageFile = new File("your path to your image");

    Tesseract instance = Tesseract.getInstance();
    //In case you don't have your own tessdata, let it also be extracted for you
    File tessDataFolder = LoadLibs.extractTessResources("tessdata");

    //Set the tessdata path
    instance.setDatapath(tessDataFolder.getAbsolutePath());

    try {
         result = instance.doOCR(imageFile);

    } catch (Exception e) {
        e.printStackTrace();            
    }

    return result;
}
user3182266
  • 1,270
  • 4
  • 23
  • 49
0

You need to set instance.setDatapath to the parent directory of tessdata folder.

File tessDataFolder = LoadLibs.extractTessResources("tessdata"); // Maven build bundles English data
instance.setDatapath(tessDataFolder.getParent());

See http://tess4j.sourceforge.net/tutorial.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • Yes, I figured that much but I am using `maven` so even if I point to the directory of the `.jar` file its all the same error. – user6006748 May 14 '16 at 10:02
  • @nguyenq hi nguyen ! i'm a big fan of your product VietOCR using tesseract . I have to develop the same thing but which can recognize arabic characters. Can i get your code and try to modify it to recognize arabic ? i tried tesseract but the file ara.traineddata is not so good and i didn't get the result that i want. So can you help me on this ? – Hohenheim May 26 '16 at 11:04