I am dealing with a kind of captchas with some noisy stripes. They are drawn in a random direction and they are straight. The color of digits and stripes are truly random.
The code below is able to recognize digits from some captchas with the help of tesseract
.
from pytesser.pytesser import *
from PIL import Image, ImageFilter, ImageEnhance
im = Image.open("test.tiff")
im = im.filter(ImageFilter.MedianFilter()) # blur the image, the stripes will be erased
im = ImageEnhance.Contrast(im).enhance(2) # increase the contrast (to make image clear?)
im = im.convert('1') # convert to black-white image
text = image_to_string(im)
print "text={}".format(text)
The approach of removing stripes is to blur the image first and then re-sharp it. The accuracy of the recognition is 100% in most case, but I'm thinking if there are some other approaches to remove stripes without blurring the digits.
Any hints are highly appreciated.