0

I am using tess4j api for reading an image for numerics.

code as below:

public static void main(String[] args) {
    // TODO Auto-generated method stub
    
       final File imageFile = new File("C:\\Users\\goku\\Desktop\\myimage.png");
        System.out.println("Image found");
       final ITesseract instance = new Tesseract();
        instance.setTessVariable("tessedit_char_whitelist", "0123456789");
        instance.setDatapath("C:\\Users\\goku\\Downloads\\Tess4J"); 
        instance.setLanguage("eng");
        String result;
        try {
            result = instance.doOCR(imageFile);
            System.out.println(result);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
       

}

Image attached. 1

The program is reading the numerics as wrong. Not able to find the issue.

output:

1 1 3 251

regards, Vasu

PredragDj
  • 313
  • 1
  • 3
  • 10
V.B
  • 59
  • 5

3 Answers3

0

Rescaling the image to 300 DPI would get the correct result.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
0

This is how to properly edit image with im4java (imagemagick) so it can be read with tess4j (tesseract):

private static File processImage(File img) throws IOException {
    File newImg = File.createTempFile("asdf", ".png");

    ImageMagickCmd cmd = new ImageMagickCmd("convert");
    IMOperation op = new IMOperation();

    op.addImage(img.getAbsolutePath());
    op.strip().resample(300).colorspace("gray").autoLevel().threshold(35000).type("bilevel").depth(8).trim();
    op.addImage(newImg.getAbsolutePath());
    cmd.run(op);

    return newImg;
}
Erikas
  • 1,006
  • 11
  • 22
0

It might be the trained data. I have used the trained data from the tesseract-ocr-w64-setup-v4.1.0.20190314.exe Windows binary, found at https://digi.bib.uni-mannheim.de/tesseract/, with the datapath set as below

instance.setDatapath("C:\\Program Files\\Tesseract-OCR\\tessdata");

I do get a warning about the resolution, but the result is correct: 471871882819

Patrick S
  • 1
  • 1