1

I'm trying to read this captcha. I'm using ImageMagick to treat the image and Tesseract to interpret it. But without success. Could anyone help me?

convert captcha.png -fuzz -50% -transparent black -fuzz 15% -fill black -opaque 'rgb(16,128,176)' -fuzz 40% -fill white +opaque blue -colorspace gray captcha_clean.png

tesseract --psm 7 captcha_clean.png stdout

Should I use Python? Maybe you have a Python working example?

enter image description here

enter image description here

UPDATE:

Here is how to detect a black horizontal line and replace it with yellow color

    img = cv2.imread("JgqK4.png")
    (h, w) = img.shape[:2]
    img = cv2.resize(img, (w*2, h*2))

    blackMin = np.array([0,  0, 0],np.uint8)
    blackMax = np.array([0, 0, 0],np.uint8)
    HSV  = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(HSV, blackMin, blackMax)
    SE   = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (8,9))
    mask = cv2.dilate(mask, SE, iterations=1)
    img[mask>0] = [0,255,255]
    cv2.imwrite("JgqK4_2.png", img)
Pavel Eremin
  • 91
  • 3
  • 12
  • You're not likely to get help with that on this forum. – GeeMack Jun 30 '23 at 18:01
  • @GeeMack Do you know where and how I can find a solution or some help? – Pavel Eremin Jul 06 '23 at 19:09
  • @ahmet what do you think is it resolvable at all? – Pavel Eremin Jul 19 '23 at 20:07
  • As I understood it a bit similar to https://stackoverflow.com/questions/65806979/transparent-captcha-image-with-horizontal-line – Pavel Eremin Jul 19 '23 at 20:09
  • 1
    You can't use the open-source popular OCR like tesseract or EasyOCR for this. The reason being a captcha is mostly used to check whether a physical, legitimate user is present on the client side or not. If any OCR could perform this task then the whole idea is rendered useless. In such cases, you may have to turn to deep learning methods or retrain existing OCR models that provide the end-to-end training pipeline. – tintin98 Jul 24 '23 at 18:49
  • Also why do you need to do this? Can you please clarify that in your question? – tintin98 Jul 25 '23 at 16:41
  • These characters are not from standard font types. I would be surprised if any OCR tools like Tesseract could recognize them correctly and consistently even after preprocessing operations. One solution to this problem is to train a new DL model to recognize the characters from this specific captcha system like others have already pointed out. – karlphillip Jul 26 '23 at 20:06

1 Answers1

0

CAPTCHA is an acronym for "completely automated public Turing test to tell computers and humans apart". This explains, why reading the text from a CAPTCHA by a machine is a very complex task - by intention.

OCR will only work for very simple CAPTCHA graphics.

If you really want to find a solution, you could train a machine-learning model, which will cost you a few hours for programming and training and quite a bit for the computing power needed - by intention.

Adrian Dymorz
  • 875
  • 8
  • 25