3

I am using PyTesser to break a captcha. PyTesser uses tesseract python ocr library. Before putting image to PyTesser, I use some filtering. Step by step my code: input image is: input image

from PIL import Image
img = Image.open('1.gif')
img = img.convert("RGBA")
pixdata = img.load()
# Clean the background noise, if color != black, then set to white.
for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y][0] < 90:
         pixdata[x, y] = (0, 0, 0, 255)

for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
     if pixdata[x, y][2] < 136:
         pixdata[x, y] = (0, 0, 0, 255)

for y in xrange(img.size[1]):
    for x in xrange(img.size[0]):
        if pixdata[x, y][3] > 0:
            pixdata[x, y] = (255, 255, 255, 255)


img.save("input-black.gif", "GIF")

After applying this code output is:

Now,

im_orig = Image.open('input-black.gif')
big = im_orig.resize((116, 56), Image.NEAREST)

ext = ".tif"
big.save("input-NEAREST" + ext)

After this code snippet output image is:

enter image description here

And finally when I apply this

from pytesser import *
image = Image.open('input-NEAREST.tif')
print image_to_string(image)

I am getting output %/ww

Please help me to find correct result.

If I try with these images, this code can successfully recognize letters.

enter image description here enter image description here

Moshi
  • 1,385
  • 2
  • 17
  • 36
  • Have you tested it on an easy to recognize image? Does PyTesser need training? – BlamKiwi Feb 10 '15 at 06:22
  • I tested that with some images that is successfully recognized. – Moshi Feb 10 '15 at 06:23
  • Then maybe that's just what it thinks the image is. – BlamKiwi Feb 10 '15 at 06:49
  • I think problem is extra zigzag lin in my captcha. But how can I remove the zigzag line so that I can put the image to pytesser with just letters. – Moshi Feb 10 '15 at 09:13
  • 1
    Just throwing something crazy out there.. You could start on the left and find path of black pixels all the way to the right (a depth first search algorithm). Repeat this until there are no more paths left?It will affect the letters though. And rather computationally intensive perhaps. – Vincent Ketelaars Mar 01 '15 at 06:42
  • Maybe you can create some samples, since all those captcha are similar. And compare using samples. – Fernando Freitas Alves Mar 06 '15 at 13:29

1 Answers1

1

You need to execute some basic morphologic image operations to remove the line before executing captcha recognizer. Try combining ndimage.binary_erosion and ndimage.binary_dilation .

sniperd
  • 5,124
  • 6
  • 28
  • 44