Semantic Image Segmentation with colored masks

Question

So i have a set of pictures with their masks in color for example color blue is for chair, red for lamps, etc.

As I am new to all of this i have tried doing doing this with the unet model, i have processed the images with keras and like this.

def data_generator(img_path,mask_path,batch_size):
    c=0
    n = os.listdir(img_path)
    m = os.listdir(mask_path)
    random.shuffle(n)
    while(True):
        img = np.zeros((batch_size,256,256,3)).astype("float")
        mask = np.zeros((batch_size,256,256,1)).astype("float")

        for i in range(c,c+batch_size):
            train_img = cv2.imread(img_path+"/"+n[i])/255.
            train_img = cv2.resize(train_img,(256,256))
            img[i-c] = train_img

            train_mask = cv2.imread(mask_path+"/"+m[i],cv2.IMREAD_GRAYSCALE)/255.
            train_mask = cv2.resize(train_mask,(256,256))
            train_mask = train_mask.reshape(256,256,1)

            mask[i-c]=train_mask

        c+=batch_size
        if(c+batch_size>=len(os.listdir(img_path))):
            c=0
            random.shuffle(n)

        yield img,mask

Now looking closer I think this way wont work with my masks, i tried processing the masks as rgb color but my model wont train like that.

model.

def unet(pretrained_weights = None,input_size = (256,256,3)):
    inputs = Input(input_size)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
    conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
    conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
    conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
    pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
    conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
    drop4 = Dropout(0.5)(conv4)
    pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)

    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
    conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
    drop5 = Dropout(0.5)(conv5)

    up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
    merge6 = concatenate([drop4,up6], axis = 3)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)

    up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
    merge7 = concatenate([conv3,up7], axis = 3)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
    conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)

    up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
    merge8 = concatenate([conv2,up8], axis = 3)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
    conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)

    up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
    merge9 = concatenate([conv1,up9], axis = 3)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
    conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
    conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)

    model = Model(input = inputs, output = conv10)

    model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])

    #model.summary()

    if(pretrained_weights):
        model.load_weights(pretrained_weights)

    return model

So my question is how do I train a model with colored image masks.

Edit, example of the data i have.

given image to train the model

mask of it

and the percentage of every mask like this. {"water": 4.2, "building": 33.5, "road": 0.0}

could you please clarify your question? what you are trying to achieve? you want your NN to be able to find objects withing an image and label them? — Michael, Oct 01 '19 at 19:10
Not actually label them, but to tell the percentage of the items in the image. — Jaime Cuellar, Oct 01 '19 at 19:23
percentage in image area, because i have to make a json file with that information — Jaime Cuellar, Oct 01 '19 at 19:25
so your model will take image as input, classify objects on it according to pre-learnt masks and calculate percentage of area given by every object? — Michael, Oct 01 '19 at 19:27
Edited my question adding more info. And yea that is the objective of my model, for example of an image to predict i should be able to know percentage of road, water, and building — Jaime Cuellar, Oct 01 '19 at 19:36
ok, now it's clear what you want to do. thank you for updating the question! this topic looks pretty deep to me to advice the particular model. try to look for use cases on google colab and kaggle websites. overall search `satellite imagery feature detection` topic — Michael, Oct 01 '19 at 19:56

Kaushik Roy · Answer 1 · 2019-10-02T02:36:44.733

In a Semantic segmentation problem, each pixel belongs to any of the target output classes/labels. Therefore, your output layer, conv10, should have the total number of classes (n_classes) as the value of no._of_kernels and softmax as the activation function like follows:

conv10 = Conv2D(**n_classes**, 1, activation = 'softmax')(conv9)

In this case, the loss should also be changed to categorical_crossentropy while compiling u-net model.

model.compile(optimizer = Adam(lr = 1e-4), loss = 'categorical_crossentropy', metrics = ['accuracy'])

Additionally, you should not normalize your true label/mask image rather can encode as follows:

train_mask = np.zeros((height, width, n_classes))
for c in range(n_classes):
    train_mask[:, :, c] = (img == c).astype(int)

[I have assumed that you have more than two true output classes/labels as you mentioned yours mask contains different colors for water, road, building, ...etc; If you have only two classes then your model configuration is fine except train_mask processing.]

Semantic Image Segmentation with colored masks

1 Answers1