How to filter a texture from an image for OCR

Question

I'm trying to do OCR to some forms that, however, have some texture as follows:

Original Image

This texture causes the OCR programs to ignore it tagging it as an image region.

I considered using morphology. A closing operation with a star ends up as follows:

Closing operation

This result is still not good enough for the OCR.

When I manually erase the 'pepper' and do adaptive thresholding an image as follows gives good results on the OCR:

Edited and thresholded

Do you have any other ideas for the problem?

thanks

what variables can you control? for example, will the font always be the same, are you always looking for numerical values, etc — Leo, Sep 16 '14 at 14:17
I prefer to solve it by eliminating the texture as much as possible. In principle, the font is always the same and in some cases there will be letters or numbers. I handle it whitelisting characters on the OCR to avoid pattern matching over the image. thanks — jdcaballerov, Sep 16 '14 at 14:24
Not only, here you can find a region of the original. The final end is to have it structured. http://i.imgur.com/cg3Ee65.jpg — jdcaballerov, Sep 16 '14 at 15:58

score 1 · Answer 1 · answered Sep 16 '14 at 14:40

1

For the given image, a 5x5 median filter does a little better than the closing. From there, binarization with an adaptive threshold can remove more of the background.

Anyway, the resulting quality will depend a lot on the images and perfect results can't be achieved.

enter image description here

answered Sep 16 '14 at 14:40

this helped, thanks. Perhaps a Global as Otsu's threshold does better. – jdcaballerov Sep 16 '14 at 14:58
Unfortunately, the texture level remains significant on edges and a constant threshold is not the best thing. Maybe you can mask out the unwanted areas after a first rough location of the characters. – Sep 16 '14 at 15:00
This is what I am getting with Global Otsu after a square(5) median. http://i.imgur.com/f8bVjPY.png . Perhaps now I can do some connected component filtering as suggested (though I am not very familiar) since the canny edges look promising. thanks. Any feedback really welcome. http://i.imgur.com/C5gJmiY.png – jdcaballerov Sep 16 '14 at 15:49
You can filter on the blob size, but beware that you will lose the dots and commas, and will have trouble with fragmented characters. I don't think that Canny edges on the binarized image will be of any help. – Sep 16 '14 at 15:54

score 1 · Answer 2 · answered Sep 16 '14 at 14:51

1

Maybe have a look at this: https://code.google.com/p/ocropus/source/browse/DIRS?repo=ocroold (see ocr-doc-clean).

answered Sep 16 '14 at 14:51

user1919235

470
7
17

remi · Answer 3 · 2014-09-16T15:14:48.560

1

The background pattern is very regular and directionnal, so filtering in the Fourier domain must do some pretty good job here. Try for example the Butterworth filter

A concrete example of such filtering using gimp can be found here

edited Sep 16 '14 at 15:14

answered Sep 16 '14 at 14:52

remi

3,914
1
19
37

score 1 · Answer 4 · answered Sep 16 '14 at 15:02

Considering that you know the font size, you could also consider using connected component filtering, perhaps in combination with a morphological operation. To be able to retain the commas, just be careful if a smaller connected component is near one that has a size similar to the characters that you are trying to read.

How to filter a texture from an image for OCR

4 Answers4