Can't reduce loss in Semantic Segmentation(UNET) and Mean IOU isn't improving either

Question

We are using greyscale eye images and ground truth of eye iris to train a CNN to segment the iris. Our dataset has about 2000 images with their corresponding masks, both are of size 224x224.

We used a UNET architecture, as shown in the code below. In order to make sure that our network is working, we are trying to overfit the network to 30 images of our dataset.

We used dice loss function (mean_iou was about 0.80) but when testing on the train images the results were poor. It showed way more white pixels than the ground truth. We tried several optimizers (Adam, SGD, RMsprop) without significant difference.

We removed the activation function in the last two Conv2D layers, which improved the mean_iou and the dice loss but we still had the same problem of smudges of white pixels (wrongly predicting the presence of white pixels) but it was less than before.

Then, we used a Tversky loss function to eliminate the false positives (trying several alpha and beta values.) The results improved but when testing on the training images the network predictions weren’t accurate.

We also added a callback to reduce learning rate when the loss doesn’t change, which slightly improved the results.

When making the previous runs we tried to run for 50 epochs at a time and we would reach a point where the loss no longer is reduced about (0.1) and mean_iou=0.9. No matter how many epochs, neither get any better but only fluctuates. The learning rate was low (0.00001) and the callback would reduce it to 1x10^-8 but still no further reduction in loss.

If anyone has experience with this or can provide us with any insight on how to overcome this problem, help would be appreciated.



def conv2d_block(input_tensor, n_filters, kernel_size=3, batchnorm=True):
    # first layer
    x = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), kernel_initializer="he_normal",
               padding="same")(input_tensor)
    if batchnorm:
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
    # second layer
    x = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), kernel_initializer="he_normal",
               padding="same")(x)
    if batchnorm:
        x = BatchNormalization()(x)
        x = Activation("relu")(x)
    return x



def get_unet(input_img, n_filters=16, dropout=0.5, batchnorm=False):
    # contracting path
    c1 = conv2d_block(input_img, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)
    p1 = MaxPooling2D((2, 2)) (c1)
    p1 = Dropout(dropout*0.5)(p1)

    c2 = conv2d_block(p1, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)
    p2 = MaxPooling2D((2, 2)) (c2)
    p2 = Dropout(dropout)(p2)

    c3 = conv2d_block(p2, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)
    p3 = MaxPooling2D((2, 2)) (c3)
    p3 = Dropout(dropout)(p3)

    c4 = conv2d_block(p3, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)
    p4 = MaxPooling2D(pool_size=(2, 2)) (c4)
    p4 = Dropout(dropout)(p4)

    c5 = conv2d_block(p4, n_filters=n_filters*16, kernel_size=3, batchnorm=batchnorm)

    # expansive path
    u6 = Conv2DTranspose(n_filters*8, (3, 3), strides=(2, 2), padding='same') (c5)
    u6 = concatenate([u6, c4])
    u6 = Dropout(dropout)(u6)
    c6 = conv2d_block(u6, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)

    u7 = Conv2DTranspose(n_filters*4, (3, 3), strides=(2, 2), padding='same') (c6)
    u7 = concatenate([u7, c3])
    u7 = Dropout(dropout)(u7)
    c7 = conv2d_block(u7, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)

    u8 = Conv2DTranspose(n_filters*2, (3, 3), strides=(2, 2), padding='same') (c7)
    u8 = concatenate([u8, c2])
    u8 = Dropout(dropout)(u8)
    c8 = conv2d_block(u8, n_filters=n_filters*2, kernel_size=3, batchnorm=False)

    u9 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c8)
    u9 = concatenate([u9, c1], axis=3)
    u9 = Dropout(dropout)(u9)
    c9 = conv2d_block(u9, n_filters=n_filters*1, kernel_size=3, batchnorm=False)

    outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)
    model = Model(inputs=[input_img], outputs=[outputs])
    return model

What does the construction of your Input tensor look like? Are the values in the images scaled, perhaps to [0,1]? When viewing the predictions, do you threshold them or view the scaled predictions? If you don't threshold the predictions, you'll see smudges, that may not rise to the level of a threshold and so will not be part of a binary mask. — kentavv, Apr 10 '20 at 19:41
X_train=np.zeros((n_train,IMAGE_DIMS[0],IMAGE_DIMS[1],IMAGE_DIMS[2]),dtype=np.float32), here's how my input data is initiated, I scaled the images by dividing the array by 255.we used a threshold of 0.5 to view the predictions, and the last layer used a sigmoid activation function does it affect the results — Mohamad Omar, Apr 10 '20 at 19:52
@MohamadOmar Did you find resolution to this issue? Weight decay and not enough low lR could be issue. Setting eps to 1e-13 in LR scheduler would allow you to go to lower learning rate — saurabheights, Jan 28 '21 at 17:59

Can't reduce loss in Semantic Segmentation(UNET) and Mean IOU isn't improving either

0 Answers0