Captcha recognition using ocr

Question

I am trying to make a code for solving captcha for such images:

Captcha

Here is the processed image:

enter image description here

And my code:

image = cv2.resize(image, (300,120))
image = cv2.dilate(image, None, iterations=1)
image = cv2.GaussianBlur(image,(1,9),0)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
image = cv2.medianBlur(image,5)
cv2.imshow("Image", image)
cv2.imwrite("im.jpg",image)
text =pytesseract.image_to_string(image,config='--psm 8 -c 
tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz')
print(text)

But the code predicts 9922s for the given captcha. Instead of pez2s. Please help me solve this problem.

I think it could be better if you segment each character and then use `pytesseract.image_to_string`. Perhap it's because those characters are different in size so it gave you incorrect result. — Ha Bom, Nov 17 '18 at 15:46
@HaBom That won't work in a generalized version as the captcha is placed at random places in the image and the width of letters is also different. — Muskan Bansal, Nov 17 '18 at 15:56
You can segment using the second method as I answer in [this post](https://stackoverflow.com/questions/53317536/segment-each-character-from-noisy-number-plate/53330636#53330636). And then you can sort the character from left to right, so no matter how characters are placed in the image, you still get the right order. Sorry I'm not using my work computer now so I can't give you more details. — Ha Bom, Nov 17 '18 at 16:24
Doh! It’s almost as if the captcha is designed to be difficult to solve using a computer and OCR. Oh well. — DisappointedByUnaccountableMod, Nov 17 '18 at 20:52

Captcha recognition using ocr

0 Answers0