I'm trying to crack a particular web CAPTCHA. I'm planning to do it by segmenting the characters and passing them to an ANN (mostly for features, I will be using method of moments as it seems difficult to completely remove noise completely)
The captcha is very noisy, and unfortunately there is no color difference between the noise and the actual text, so separation based on color will not work. After quite some thought, I managed to implement a flood-fill style algorithm on the pixels of the captcha to separate small disconnected components, and after this I ended up with something like this:
Most of the noise is gone but some of it is left around the letters themselves (since it is touching the text). I'm not an expert on image filters, and I'm finding it very difficult to find the right filter to reduce the remaining noise and enhance the characters. Any Ideas on what filter(s) I could use for this purpose.
(Note: I'm not using any image manipulation tool/library for this. I'm writing raw pixel manipulation code, but I can implement most filters given their convolution kernel)
The problem is that due to this noise, it is becoming difficult to segment the characters. Clearly trying to find vertical lines with no dark pixels is not going to work, since there is noise and some of the letters are touching. Any ideas on how I could segment these efficiently?
EDIT: Original image