Detecting numbers from image of different lighting condition using tesseract OCR, Python

Question

the problem for this image with different light condition:- Img_1

worked for this morning image:- Img_2

I am trying to extract numbers from the video by using Tesseract OCR & Python . in that video, there is a train available on each train there is a number plate that I will be detected by using YOLO. once I got the plate I'm preprocessing it & extract no from the plate. I am able to extract it but if there is a different light condition or there is night in the video then my preprocessing will not work properly. so I wanted a Genral purpose Logic to preprocess images in a such way that it'll work for all images, not for some. I have attached some images for the trial. in the CODE I have extracted only part of the plate manually then pass it to getText(image) to perform preprocessing & extract no

def getText(img):

    # color threshold
    thresh = otsu_thresholding(img)
    gray = get_grayscale(thresh)
    gray[gray != 0] = 255

    # second way
    thresh_2 = adaptive_thresholding(get_grayscale(img), method=1)
 
    cv2.imshow("img", img)
    cv2.imshow("Thresh", thresh)
    cv2.imshow("Thresh_2", thresh_2)
    cv2.imshow("Gray", gray)

    data = pytesseract.image_to_string(gray, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    data_2 = pytesseract.image_to_string(thresh_2, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    return data, data_2

Can anyone help me in pre-processing the images before feeding them to tesseract?

those pictures are tiny and have no contrast. if you're working with those, nothing will work. please present the real data, not small pieces of it. please review [ask] and [mre]. the code you posted is 208 lines, with 109 lines of code. that's a lot and I doubt it's a MRE. — Christoph Rackwitz, Mar 08 '22 at 15:34
@ChristophRackwitz thank you for your help. plz ignore the CODE part (if you check getText(image) function that will be enough) but I wanted a robust/common logic to preprocess images in such a way that my number detection algorithm works for all images, not for some — satyam pawar, Mar 09 '22 at 11:46
Unlike what science fiction movies have shown, you can't extract information from an image that doesn't have it. You will NEVER get an algorithm that works for all images. It's flat out impossible. Your first image just doesn't have enough contrast. — Tim Roberts, Mar 10 '22 at 05:32

Detecting numbers from image of different lighting condition using tesseract OCR, Python

0 Answers0