I'm creating my own Dataset for People and Street Segmentation. Below, you see a labled Ground Truth (GT) Image.
In the past I did a simple Regression between the model output and the GT Image (in the past I only used Streets). Now I read, Cross Entropy Loss is more common in that case. Since, my GT and also the model output Image has the same width w and height h as the input image, I have to create an array of size h x w x c, where c is the number of classes (in my case 3, background, street, people). I think, this is called One-Hot-Encoded Array.
I solved this as follows:
for height in range(len(img_as_np_array)):
for width in range(len(img_as_np_array[0])):
temp = np.zeros(classes)
if get_class(img_as_np_array[height,width]) == 1:
temp[1] = 1
one_hot_label[height,width] = temp
if get_class(img_as_np_array[height,width]) == 2:
temp[2] = 1
one_hot_label[height,width] = temp
where the method get_class(channels) decides the pixel class by the color of the pixel.
def get_class(channels):
threshold = 40
# Class 1 corresponds to streets, roads
if channels[0] in np.arange(243-threshold,243+threshold,1) and \
channels[1] in np.arange(169-threshold,169+threshold,1) and \
channels[2] in np.arange(0,threshold,1):
return 1
# Class 2 corresponds to people
if channels[0] in np.arange(0,threshold,1) and \
channels[1] in np.arange(163-threshold,163+threshold,1) and \
channels[2] in np.arange(232-threshold,232+threshold,1):
return 2
# Class 0 corresponds to background respectively other things
return 0
I have two questions:
My approach is very slow (about 3 minutes for a Full HD Image), is there a way to speed this up?
I noticed that the colors differ in the sense of channel values. For example, orange should be [243,169,0] (RGB), but I found entries like this [206,172,8] or even this [207,176,24] could that happen, because I store my labels as jpg? Is there a better way to find the orange and blue pixels than my idea above with the threshold?
EDIT:
I solved the first question by my self. This takes 2 or 3 seconds for an Full HD Image:
threshold = 40
class_1_shape_cond_1 = (img_as_array[:, :, 0] >= 243 - threshold) * (img_as_array[:, :, 0] <= 243 + threshold)
class_1_shape_cond_2 = (img_as_array[:, :, 1] >= 171 - threshold) * (img_as_array[:, :, 1] <= 171 + threshold)
class_1_shape_cond_3 = (img_as_array[:, :, 2] >= 0) * (img_as_array[:, :, 2] <= threshold)
class_1_shape = (class_1_shape_cond_1 * class_1_shape_cond_2 * class_1_shape_cond_3)
Then I do the same for class 2 and for class 3 (everything else) I can do:
class_3_shape = 1 - (class_1_shape + class_2_shape)
After that I have to adjust the type with:
class_1_shape = class_1_shape.astype(np.uint8)
class_2_shape = class_2_shape.astype(np.uint8)
class_3_shape = class_3_shape.astype(np.uint8)
Question 2 is still an open.