why categorical cross entropy loss function in training unet model for multiclass semantic segmentation is very high?

Question

I want to do semantic segmentation for a dataset of CMR images using Unet model. The model is perfectly working for other CMR images but when applying it to the new dataset, it behaves strangely. I used categorical cross-entropy as a loss function to segment masks into 4 classes including the background. This is the Unet model (I got it from a github page that now I don't remember the address)I'm using:

def down_block(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    p = keras.layers.MaxPool2D((2, 2), (2, 2))(c)
    return c, p

def up_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1):
    us = keras.layers.UpSampling2D((2, 2))(x)
    concat = keras.layers.Concatenate()([us, skip])
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    return c

def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    c = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(c)
    return c

def UNet(image_size, nclasses=4, filters=64):
    f = [16, 32, 64, 128, 256]
    inputs = keras.layers.Input((image_size, image_size,1))
    
    p0 = inputs
    c1, p1 = down_block(p0, f[0]) #128 -> 64 ##(do we aim to get 16 feature maps? isn't is by using different masks?)
    c2, p2 = down_block(p1, f[1]) #64 -> 32
    c3, p3 = down_block(p2, f[2]) #32 -> 16
    c4, p4 = down_block(p3, f[3]) #16->8
    
    bn = bottleneck(p4, f[4])
    
    u1 = up_block(bn, c4, f[3]) #8 -> 16
    u2 = up_block(u1, c3, f[2]) #16 -> 32
    u3 = up_block(u2, c2, f[1]) #32 -> 64
    u4 = up_block(u3, c1, f[0]) #64 -> 128
    
    outputs = keras.layers.Conv2D(nclasses, (1, 1), padding="same", activation="softmax")(u4)
    model = keras.models.Model(inputs, outputs)
    return model
image_size = 256
model = UNet(image_size)
optimizer = keras.optimizers.SGD(lr=0.0001, momentum=0.9)
model.compile(optimizer= optimizer, loss='sparse_categorical_crossentropy' , metrics=["accuracy"])

I, also, used to_categorical function over mask images. the problem is that the predicted mask is a blank image which maybe because it predicts just the background class because of an imbalanced dataset. Also, the loss value is started form around 1.4 and just decrease to 1.3 which shows the model learned very little. I would be appreciated if someone explains me the solution if there is any...

P.S. should I balance the dataset fist? if yes how?

score 1 · Answer 1 · answered Jul 27 '20 at 11:27

there's two problems in your method first you said you used "to_categorical function" this is not the way to do with " Sparse categorical crossentropy loss ".

If you want to use " to_categorical" function on your mask data you'll have to use " CategoricalCrossEntropy " Loss.

Now if you want to use your RAW mask data with labels like " 0,1,2,3 " you can use " Sparse Categorical CrossEntropy loss " but with " From logits = True " this way :

model.compile(optimizer= optimizer, 
             loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
             metrics=["accuracy"])

And don't use softmax as an activation in the last layer it will return you the " edges " of your classes ( in the best case ) and not their semantic segmentation ( using no activation function will do the trick )

Thank you for your answer. yes you are right I made a mistake about the loss function but the problem still there by categorical cross-entropy too. I think it is because the machine is not learning through the dataset by the defined model. So, I changed the model architecture and the problem solved. — Mahyar, Jul 27 '20 at 17:56

why categorical cross entropy loss function in training unet model for multiclass semantic segmentation is very high?

1 Answers1