0

I am training a model to predict a segmentation in medical images. In the training data, the input data is of type: numpy.float64 and the ground truth labels are of type: numpy.uint8. The problem is for some reason my model is producing an output type of numpy.float32.

Image showing: example of data types

# Defining the model
segmenter = Model(input_img, segmenter(input_img))

# Training the model (type of train_ground is numpy.uint8)
segmenter_train = segmenter.fit(train_X, train_ground, batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(valid_X, valid_ground))

Model definition:

def segmenter(input_img):
    #encoder
    #input = 28 x 28 x 1 (wide and thin)
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) #28 x 28 x 32
    conv1 = BatchNormalization()(conv1)
    conv1 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv1)
    conv1 = BatchNormalization()(conv1)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1) #14 x 14 x 32
    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool1) #14 x 14 x 64
    conv2 = BatchNormalization()(conv2)
    conv2 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv2)
    conv2 = BatchNormalization()(conv2)
    pool2 = MaxPooling2D(pool_size=(2, 2))(conv2) #7 x 7 x 64
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool2) #7 x 7 x 128 (small and thick)
    conv3 = BatchNormalization()(conv3)
    conv3 = Conv2D(128, (3, 3), activation='relu', padding='same')(conv3)
    conv3 = BatchNormalization()(conv3)


    #decoder
    conv4 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv3) #7 x 7 x 128
    conv4 = BatchNormalization()(conv4)
    conv4 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv4)
    conv4 = BatchNormalization()(conv4)
    up1 = UpSampling2D((2,2))(conv4) # 14 x 14 x 128
    conv5 = Conv2D(32, (3, 3), activation='relu', padding='same')(up1) # 14 x 14 x 64
    conv5 = BatchNormalization()(conv5)
    conv5 = Conv2D(32, (3, 3), activation='relu', padding='same')(conv5)
    conv5 = BatchNormalization()(conv5)
    up2 = UpSampling2D((2,2))(conv5) # 28 x 28 x 64

    conv6 = Conv2D(64, (3, 3), activation='relu', padding='same')(up2) #7 x 7 x 128
    conv6 = BatchNormalization()(conv6)
    conv6 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv6)
    conv6 = BatchNormalization()(conv6)
    up3 = UpSampling2D((2,2))(conv6) # 14 x 14 x 128

    conv7 = Conv2D(64, (3, 3), activation='relu', padding='same')(up3) #7 x 7 x 128
    conv7 = BatchNormalization()(conv7)
    conv7 = Conv2D(64, (3, 3), activation='relu', padding='same')(conv7)
    conv7 = BatchNormalization()(conv7)
    up4 = UpSampling2D((2,2))(conv7) # 14 x 14 x 128

    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(up4) # 28 x 28 x 1
    return decoded

Thanks in advance for help on this :)

aksg87
  • 63
  • 1
  • 9
  • Well, the structure and data types of the model are fixed at the time of constructing the model, before you call .fit() on it, so the type of data labels provided during training can't have any impact on it. What is the structure of the model, specifically, what exactly is the last layer? – Peteris Mar 26 '19 at 01:39
  • Thanks, @Peteris, I just added details on the model construction including the last layer. If anything else would be helpful please let me know! – aksg87 Mar 26 '19 at 02:46
  • There is no problem, the output of a neural network is always a real number, to get a binary decision you have to threshold the output to get a binary decision. If you use softmax activation then you select the class with higher probability. – Dr. Snoopy Mar 26 '19 at 07:39

1 Answers1

1

Sigmoid returns a real number

The last layer happens to be the sigmoid activation function. It returns a real number from 0 and 1, not an integer.

Furthermore, it's important that the error metric, the difference between the correct answer and the calculated value, is continuous and not discrete, because that's differentiable and allows proper learning of the neural network weights with backpropagation.

Just convert and round

For training the network, just convert the truth labels to floating point values.

Once you've trained the network and want to use its outputs, just round them to convert them to integers - sigmoid activation is well suited for that.

Peteris
  • 3,281
  • 2
  • 25
  • 40
  • Thanks, this makes perfect sense. The sigmoid function is only able to return a maximum value of 1 but the test data labels have values higher than 1. I can scale the training data labels so the maximum value is 1 which will allow the model to learn the correct output. – aksg87 Mar 26 '19 at 14:18
  • On another note, this was an Interesting error for me! I must have a pretty high loss on the training and validation set since the function is unable to really match the target data output due to the max value of 1. – aksg87 Mar 26 '19 at 14:19
  • @aksg87 if you want the network to return one of a set of discrete numbers, then it's likely that sigmoid is a bad fit for that final layers, and instead you might want to use e.g. a one-hot encoding with categorical cross-entropy if it's essentially a classification problem; or if it's a scalar but discrete output, then using linear or ReLU activation for the final layer. – Peteris Mar 26 '19 at 14:22
  • Yes, these different options make sense now, thanks! I am predicting an image mask with discrete values for each pixel (0,1,2,3,4). I think the problem is similar to this other post (https://stackoverflow.com/questions/45178513/how-to-load-image-masks-labels-for-image-segmentation-in-keras). Cross-entropy loss looks like the best fit for now! – aksg87 Mar 26 '19 at 18:05