0

Why does my Tesseract instance require me to explicitly set my datapath, but doesn't want to read the environment variable?

Let me clarify: running the code

ITesseract tesseract = new Tesseract();
String result = tesseract.doOCR(myImage);

Throws an error:

Error opening data file ./tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the 
parent directory of your "tessdata" directory.

I already have set my environment variable, ie doing

echo $TESSDATA_PREFIX returns /usr/share/tessdata/

Now, setting the path variable explicitly in my code, ie:

Itesseract tesseract = new Tesseract();
tesseract.setDatapath("/usr/share/tessdata/");
String result = tesseract.doOCR(myImage);

WORKS PERFECTLY. Why? I'm using Manjaro 17.0.5

Ognjen Mišić
  • 1,219
  • 17
  • 37

1 Answers1

1

The library was initially designed to use the data files bundled in its tessdata folder. In your case, if you want to read from the standard tessdata directory, you would want to set datapath as follows:

tesseract.setDatapath(System.getenv("TESSDATA_PREFIX"));

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • System.getenv will just return the exact same string which I've already set, though it's not hardcoded.. But, my question revolves more around "why is the library unable to locate its own `share` library" without me having to explicitly tell it where to look? – Ognjen Mišić Nov 03 '17 at 08:21
  • 1
    By default, tess4j uses its local `tessdata` folder, not Tesseract's. It's not aware of the path defined by `TESSDATA_PREFIX` variable. – nguyenq Nov 05 '17 at 16:57