I am trying to work with OCR (Optical Character Reorganization). I have a sample image and i want to read data out of it. Below is my sample image file.
I have used tess4j
API to read the text from image. Please find the below piece of code.
public static String crackImage(String filePath) {
File imageFile = new File(filePath);
ITesseract instance = new Tesseract();
instance.setLanguage("eng");
try {
String result = instance.doOCR(imageFile);
return result;
} catch (TesseractException e) {
System.err.println(e.getMessage());
return "Error while reading image";
}
}
public static void main(String[] args) {
String results = crackImage("D:\\data\\testImage.PNG");
System.out.print(results);
}
Below is the dependency i have in my pom.xml file.
<dependencies>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
And i have created tessdata\eng.traineddata
structure in my project directory.
When i run the code. It is working fine but i am getting some wrong results (May be in different language) like below.
Creale a Voumhe metauzoa mwwer usmg szz
I am not sure, why this text printed as a result, even when i set language as ENGLISH explicitly. Can someone help me to solve this issue.