0

I'm trying to recognize Captcha to Text.

This captcha is not very difficult. (as I think).

I open the image and convert it with OpenCV, to make it easy to recognize.

I will show you an example. Example Captcha

Example Captcha

After OpenCV Catpcha

After OpenCV Catpcha

image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imwrite('OPENCV.png', gray)

# Get Text From Image
pytesseract.image_to_string(Image.open('OPENCV.png'), lang='eng', config="-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ --psm 8")

It's simple. But result is 'PLLY2', But I want 'PLLVI2' OR 'PLLV12'.

Is there any option or another way that I can use to get more accuracy?

I use one word option that 'psm 8'. I had tried to find to make tesseract find fixed number of characters, but it is impossible.

I will really appreciate it if you give me just a hint. Thank you very much for reading this question.

Smart Manoj
  • 5,230
  • 4
  • 34
  • 59
zenyatta
  • 97
  • 2
  • 9

1 Answers1

0

You could slice the image to make each letter and use --psm 10:

image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

gray1 = gray[:, :25]
gray2 = gray[:, 25:50]
gray3 = gray[:, 50:75]
gray4 = gray[:, 75:100]
gray5 = gray[:, 100:125]
gray6 = gray[:, 125:]

print(''.join([pytesseract.image_to_string(i, config='--psm 10 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ') for i in [gray1, gray2, gray3, gray4, gray5, gray6]])