Using PyTesser to break easy captcha

Question

I am using PyTesser to break a captcha. PyTesser uses tesseract python ocr library. Before putting image to PyTesser, I use some filtering. Step by step my code: input image is:

from PIL import Image
img = Image.open('1.gif')
img = img.convert("RGBA")
pixdata = img.load()
# Clean the background noise, if color != black, then set to white.
for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y][0] < 90:
         pixdata[x, y] = (0, 0, 0, 255)

for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
     if pixdata[x, y][2] < 136:
         pixdata[x, y] = (0, 0, 0, 255)

for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y][3] > 0:
            pixdata[x, y] = (255, 255, 255, 255)


img.save("input-black.gif", "GIF")

After applying this code output is:

Now,

im_orig = Image.open('input-black.gif')
big = im_orig.resize((116, 56), Image.NEAREST)

ext = ".tif"
big.save("input-NEAREST" + ext)

After this code snippet output image is:

enter image description here

And finally when I apply this

from pytesser import *
image = Image.open('input-NEAREST.tif')
print image_to_string(image)

I am getting output %/ww

Please help me to find correct result.

If I try with these images, this code can successfully recognize letters.

enter image description here

Have you tested it on an easy to recognize image? Does PyTesser need training? — BlamKiwi, Feb 10 '15 at 06:22
I tested that with some images that is successfully recognized. — Moshi, Feb 10 '15 at 06:23
I think problem is extra zigzag lin in my captcha. But how can I remove the zigzag line so that I can put the image to pytesser with just letters. — Moshi, Feb 10 '15 at 09:13
Just throwing something crazy out there.. You could start on the left and find path of black pixels all the way to the right (a depth first search algorithm). Repeat this until there are no more paths left?It will affect the letters though. And rather computationally intensive perhaps. — Vincent Ketelaars, Mar 01 '15 at 06:42
Maybe you can create some samples, since all those captcha are similar. And compare using samples. — Fernando Freitas Alves, Mar 06 '15 at 13:29

score 1 · Answer 1 · edited Sep 28 '18 at 18:56

1

You need to execute some basic morphologic image operations to remove the line before executing captcha recognizer. Try combining ndimage.binary_erosion and ndimage.binary_dilation .

edited Sep 28 '18 at 18:56

sniperd

5,124
6
28
44

answered Sep 28 '18 at 18:27

BigDataSaurius

89
2
5

Using PyTesser to break easy captcha

1 Answers1