Isolate colored text regions in video stream

Question

I want to detect colorful texts from 5-6 meters height in live video. Width of these texts are nearly 30-40 cm. I have used a few methods. For example, one is HSV to detect colors. But it is not useful since HSV value should change when the illumination of the environment changes. Also, it cannot detect colors after 30 cm. Also I looked for OCR for text recognition. In order to my research, people say that I should use color detection for this task since it is easier than OCR. Also, it is sufficient for the desired result.

All in all, how can I detect red and green texts from 5 to 6 meters away in live video stream even if this operation is applied in indoor or outdoor environment ?

@EnderAyhan Is there always exactly one red and one green text, and are they always at the top/bottom? Do you just want to know which text (red/green) is at the top of the maze and which is at the bottom? Also, are the words in red/green always the same? — duhaime, Aug 22 '18 at 21:27
Why are people voting to close this? This problem seems reasonably well defined, and his code isn't going to help anyone answer the question. — duhaime, Aug 22 '18 at 21:29
@duhaime Yes there is always one red and one green text. Moreover, the texts are always the same like in the example image. However, they are not always at the same place. Place is varying. I just want to detect their center point coordinate. So I can find the start and finish coordinates of the maze. — Ender Ayhan, Aug 22 '18 at 21:31
@EnderAyhan are the red and green texts always on opposite sides of the maze, or are they sometimes on neighboring sides? — duhaime, Aug 22 '18 at 21:33
@duhaime they can be placed either near by near or opposite to each other. I mean place of texts are never known exactly. It is always changing. By the way thank you for your support on my question. This negative vote issue is always happening. Unfortunately, it is hard to be sure what to ask. — Ender Ayhan, Aug 22 '18 at 21:38

score 1 · Accepted Answer · answered Aug 22 '18 at 23:01

This is more a suggestion for a possible way forward than a solution, but one thought would be to examine the aggregate hue of each row in the image.

Green (the top label) has a hue value of ~90, and red (the bottom label) has a hue value of ~0, so if we compute the sum of the hue values for each row in the image, we'd expect the greenest rows to have the highest hue values and the red rows to have the lowest hue values.

from scipy.misc import imread
import matplotlib.pyplot as plt
from colorsys import rgb_to_hsv
%matplotlib inline

# read in the image in RGB
img = imread('vUvMl.jpg', mode='RGB')

# find the sum of the Hue, Saturation, and Value values
# for each row in the image, top to bottom
rows = []
h_vals = []
s_vals = []
v_vals = []

for idx, row in enumerate(img):
    row_h = 0
    row_s = 0
    row_v = 0
    for pixel in row:
        r, g, b = pixel / 256
        h, s, v = rgb_to_hsv(r, g, b)
        row_h += h
        row_s += s
        row_v += v
    h_vals.append(row_h)
    s_vals.append(row_s)
    v_vals.append(row_v)
    rows.append(idx)

# plot the aggregate hue values for each row of the image
plt.scatter(rows, h_vals)
plt.title('Aggregate hue values for each row in image')
plt.show()

Result:

The plot has high values toward the left and low values toward the right, suggesting the green text is at the top of the image and the red text is at the bottom of the image.

You'd need to transpose the image matrix and find the column-wise hue values if one of the labels were on the left/right side of the image, but hopefully this can spur your ideas...

it is definetely different point of view and got me think out of the cliche. Last but not the least, do you recommend to do this by using deep learning ? — Ender Ayhan, Aug 22 '18 at 23:54
@EnderAyhan if you are comfortable with neural networks it could be worth a try. It's good that your prediction space is well constrained: each image has only four sides (up, right, down, left) and each side will have a value of red, green, or None. If you hand labelled some images it wouldn't be too hard to train a classifier. If you are at a university others there may even be able to help you with the task... — duhaime, Aug 23 '18 at 00:04

Isolate colored text regions in video stream

1 Answers1