Training 3dconv neural network fails; loss converges at .6931

Question

I wrote a script to train a neural network to use .nii files as input, using the tutorial from TensorFlow here https://www.tensorflow.org/tutorials/load_data/images. I changed it slightly to work with NiBabel and .nii files but it still follows the same basic structure. However, I have encountered a problem where my loss converges to 0.6931, which I assume is because the model begins to guess the same thing regardless of the input, image shape or batch size. Thus, I believe the model is not learning. Can anyone identify any fatal flaws with my code; I’ve already tired:

Callbacks with changing LR
Changing the data, cleaning it, and reorganizing it
Changing the proportions of the amount of each class
Using different optimizers and loss functions
Using a simple dense, dense, dense model but that does not seem to work as it does not even want to begin training
Using a repeating dataset as well as a fixed size (Although it is unclear to me what difference that makes)

# Gets the label of the image, the label determines how tensorflow will classify the image
def get_label(file_path):
    # Convert the path to a list of path components
    parts = tf.strings.split(file_path, os.path.sep)
    # The fourth last is the class-directory
    return float(parts[-4] == "class1")


# Reads the data from a .nii file and returns a NumPy ndarray that is compatible with tensorflow
def decode_img(img):
    img = nib.load(img.numpy().decode('utf-8'))
    # convert the compressed string to a NumPy ndarray
    data = img.get_fdata()
    # Resize img
    data = np.resize(data, imgshape)
    # Normalize
    max = np.amax(data)
    min = np.amin(data)
    data = ((data-min)/(max-min))
    return data


# Processes a path to return a image data and label pair
def process_path(file_path):
    # Gets the files label
    label = get_label(file_path)
    img = decode_img(file_path)
    return img, label

I'm using these functions to process my data and mapping it over my list files datasets to process my data.

def configure_for_performance(ds):
    #ds = ds.cache(filename='cachefile')
    ds = ds.cache()
    ds = ds.shuffle(buffer_size=1000)
    ds = ds.repeat()
    ds = ds.batch(BATCH_SIZE)
    ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

I pulled this directly from the TensorFlow tutorial.

# Create a sequential network
model = tf.keras.Sequential([
    tf.keras.layers.Convolution3D(
        4, 4, padding='same', data_format="channels_last", input_shape=imgshape, activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Convolution3D(4, 4, padding='same', activation='tanh'),
    tf.keras.layers.MaxPooling3D(padding='same'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(2048, activation='tanh'),
    tf.keras.layers.Dense(1024, activation='tanh'),
    tf.keras.layers.Dense(512, activation='tanh'),
    tf.keras.layers.Dense(256, activation='tanh'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])
model.summary()
model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=500,
    steps_per_epoch=BATCH_SIZE,
    validation_steps=BATCH_SIZE
)

This is my model, I'm using 3dconv similarly to how 2dconv is used in conventional image classification.

Any advice would be appreciated!

Can you add training logs that are generated after running model.fit()? — Aniket Bote, Aug 29 '20 at 01:55

score 0 · Answer 1 · answered Aug 29 '20 at 02:21

Your code for fetching the images looks good however I cannot test it for myself because i'm not sure how your data is stored. Also the fact that your model will begin training indicates the error is probably not here. If you want to make sure you can use matplotlib to show the images to make sure they loaded in properly.

I would begin by making your model as simple as it can be and still work, test to see if it still converges to 0.6931 or some other number. Then try using a different activation function ie relu. Another approach could be to use some batch-normalisation. My theory is that you have very large or small values going into your tanh function, this causes the output to near 0 or 1 every time. This also prevents further training as there is a very small gradient to train with. Changing to relu may circumnavigate this problem for large values but maybe not small ones. Using batch-normalisation will bring your values away from the extremities where the tanh output is only 0 or 1.

score 0 · Answer 2 · answered Aug 29 '20 at 09:30

If you are consistently converging to the exact same loss then there is only one explanation in my experience - you have coded the data loader incorrectly. What is happening is the image and the label do not match. It is trying to learn pure randomness. In such a scenario, it will simply do the best thing that it can which is output the 'average' correct answer. I suspect that the 0.69 value comes from your data labels e.g you have 69% Class 1's and 31% Class 0's.

Training 3dconv neural network fails; loss converges at .6931

2 Answers2