I'm trying to design a CAPTCHA recognition algorithm using Python, OpenCV and Tesseract. The problem is the CAPTCHA digits are misaligned and randomly clustered within the image. Pytesseract fails and returns empty lists most of the time. The data looks like: this should return 41332 this should return 35545
The relatively flatter CAPTCHA's where the digits are in a line get detected better. How do I solve this? How to detect, crop and realign the digits in images like these to make it easier for tesseract(if needed to be used) to detect them?