2

I am trying to use Ocr frameworks to recognize these type of images: These are 2 letters G.

enter image description here

I tried using the aocr.jar from asprise, but this code does not seem to do the trick..

import com.asprise.ocr.Ocr;
import java.io.File;
public class textRecognizer {

    public static void main(String args[]){
        Ocr.setUp();
        Ocr ocr  = new Ocr();
        ocr.startEngine("eng", Ocr.SPEED_FAST);
        String s = ocr.recognize(new File[] {new File("C:\\Users\\juchtdi\\Pictures\\letter.png")}, Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_PLAINTEXT, 0, null);
        System.out.println(s.length());     
        System.out.println(s);
        ocr.stopEngine();
    }
}

Anyone got an Idea how I can make this work? Eventually with other frameworks?

thanks :)

Edit: The compiling went without any runtime exceptions. s.length() returned 0. So it seems he reads nothing at all.

When i replace the image with an image of real text, it outputs text perfectly.

I expected/hoped for it to return 1 G

dendimiiii
  • 1,659
  • 3
  • 15
  • 26
  • 1
    That image looks really tough. Are you sure that there is any framework that can handle that? – Simon Sep 09 '14 at 09:28
  • Not really sure at all. Basicly hoping there is. It also doesnt have to be a Java framework. Also, Thank you BackSlash. – dendimiiii Sep 09 '14 at 09:34
  • Please improve your question: What did you expect, how does it fail and what did you try? – llogiq Sep 09 '14 at 09:39

1 Answers1

2

I don't think you could get an ocr framework to recognize the letter in this image without considerable preprocessing of the image.

Here's a rough idea for some preprocessing that you might try (It's a lot of work and requires a lot of tweaking with threshold values etc. and even then I can't guarantee that it will work):

  1. For each dot calculate the density of dots surrounding it and filter out all dots situated in places where the dot density is low.
  2. Then do one of the following: a) Use morphology to try to make one object out of it. b) Try to find the contours of the letter and then fill it in using watershed algorithm.
  3. Now run the OCR as you did before.
Simon
  • 6,293
  • 2
  • 28
  • 34
  • I believe your solution is the best, but parametrization is a nightmare. Since there are "many" dots everywhere, defining what is a low density will be very though... – rlinden Sep 09 '14 at 12:24
  • Thanks for your answer. Ill try and work something out with this idea as a base. Thank you! – dendimiiii Sep 09 '14 at 12:51