How to differentiate between rotated character or a number from a simple line with OpenCV

Question

At the moment I am trying to get an idea how to distinguish a character or a number from a simple line. This way I'm trying to filter irrelevant input for Tesseract OCR. My idea is to use connectedComponentsWithStats to get the minimum box around my components and then check how many white or black pixels are in a given bounding box. By setting a BW ratio, I want to find the filled boxes that are the lines I want to filter.

The input I have is a lot of images that only have a letter/character or line rotated on them. I can rotate them by the minimum rectangle but unfortunately I can't crop them. Do you have any hints or maybe a better idea to check the BW ratio in my rotated box?

Rotated component Character

more details

    analysis_of_single_groups = cv2.connectedComponentsWithStats(rotated_without_box, 4, cv2.CV_32S)
    (totalLabels_s_g, label_ids_s_g, values_s_g, centroid_s_g) = analysis_of_single_groups

    for i in range(1, totalLabels_s_g):
        x = values_s_g[i, cv2.CC_STAT_LEFT]
        y = values_s_g[i, cv2.CC_STAT_TOP]
        w = values_s_g[i, cv2.CC_STAT_WIDTH]
        h = values_s_g[i, cv2.CC_STAT_HEIGHT]

    print("x: " + str(x))


    crop_img = rotated_without_box[y:y + h, x:x + w].copy()
    cv2.imwrite("ta/cropped_" + str(i) + ".png", crop_img)

    number_of_white_pix = np.sum(crop_img == 0)  # extracting only white pixels
    number_of_black_pix = np.sum(crop_img == 255)  # extracting only black pixels
    bw_ratio = number_of_white_pix / number_of_black_pix
    bw_ratio < 0.9

Not quite clear what you want to do. But what about trying OCR on a region of interest, rotated four ways and keeping the best read ? — , Jul 25 '22 at 18:37

Gralex · Answer 1 · 2022-07-28T13:21:51.883

cv2.findContours -> filter contours by hierarchy -> filter by HW ratio

Image 2:

import cv2
import numpy as np

gray = cv2.imread("/Users/alex/Downloads/dwojD_2.png", cv2.IMREAD_GRAYSCALE)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

# https://docs.opencv.org/4.x/d9/d8b/tutorial_py_contours_hierarchy.html
contours, hierarchy = cv2.findContours(image=thresh, mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_NONE)
hierarchy = hierarchy[0]

bgr = cv2.cvtColor(thresh, cv2.COLOR_GRAY2BGR)
for i in range(len(hierarchy)):
    if hierarchy[i][3] >= 0: 
        continue # ignore, some parents are here
    
    rRect = cv2.minAreaRect(contours[i])
    size = rRect[1]
    if min(size) == 0: 
        continue
    ratio = max(size) / min(size)
    print("Min rect size", size, "; Ratio", ratio)

    # you can filter contours by width and height ratio
    isSymbol = ratio < 3
    color = (0, 255, 0) if isSymbol else (0, 0, 255)

    if isSymbol: print("> Symbol!")
    else: print("> Line!")

    
    # cv2.drawContours(image=bgr, contours=contours, contourIdx=i, color=(0, 255, 0), thickness=2, lineType=cv2.LINE_AA)
    box = np.int0(cv2.boxPoints(rRect))
    cv2.drawContours(image=bgr, contours=[box], contourIdx=0, color=color, thickness=2, lineType=cv2.LINE_AA)
    

cv2.imshow("img", bgr)
cv2.waitKey()

i tried this, but it does not work if the line is smaller ... and there is also the risk to telete 1s — Alessandro, Jul 28 '22 at 11:46
You can customize ratio threshold youself. And if it not work, please provide image. On your image only symbol `A` is appear. — Gralex, Jul 28 '22 at 12:12
I have provided another image. My plans are to delete the (sometimes slightly curved) lines — Alessandro, Jul 28 '22 at 12:57

How to differentiate between rotated character or a number from a simple line with OpenCV

1 Answers1