GradientTape returning None with custom CSI loss function

Question

I am trying to use a custom loss function (critical success index) with my simple CNN (for 64x64 pixel images) in TensorFlow I am getting a list of Nones for the gradient.

Here is the custom loss function:

from keras import backend as K

@tf.function
def custom_csi_loss(y_true, y_pred):
    # Define the target class
    target_class = 1

    # Calculate the true positives, false positives, and false negatives
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    false_positives = K.sum(K.round(K.clip(y_pred - y_true, 0, 1)))
    false_negatives = K.sum(K.round(K.clip(y_true - y_pred, 0, 1)))

    # Calculate the CSI
    csi = true_positives / (true_positives + false_negatives + false_positives)

    # Return the negative of the CSI as the loss (since we want to minimize the loss)
    return -csi

and here is the model:

def build_scnn(shape=(128, 128, 3), k_init="he_normal", dilation_rate=(1, 1), dtype=tf.float32):
    inputs = Input(shape=shape)
    normalized = BatchNormalization(axis=3)(inputs)

    x = Conv2D(64, 3, padding="same", activation="relu", kernel_initializer=k_init)(normalized)
    x = Conv2D(128, 3, padding="same", activation="relu", dilation_rate=dilation_rate, kernel_initializer=k_init)(x)
    x = Conv2D(128, 3, padding="same", activation="relu", kernel_initializer=k_init)(x)

    outputs = Conv2D(1, 1, padding="same", activation="sigmoid", dtype=dtype)(x)
    outputs = Reshape((64 * 64, 1))(outputs)
    scnn = Model(inputs, outputs, name="SCNN")
    return scnn

scnn = build_scnn(shape=(64, 64, len(gdf[features].columns)),
                  k_init=k_init,
                  dilation_rate=dilation_rate)

This is the training step:

@tf.function
def train_step(x, y):
    with tf.GradientTape(watch_accessed_variables=True) as tape:
        tape.watch(scnn.trainable_variables)
        y_pred = scnn(x, training=True)
        loss = loss_fn(y, y_pred)
    gradients = tape.gradient(loss, scnn.trainable_variables)  # differentiate loss wrt scnn weights
    print(f"gradients: {gradients}")
    optimizer.apply_gradients(zip(gradients, scnn.trainable_variables))
    return loss, y_pred

and here is the main body of the code:

for epoch in range(epochs):
    epoch_loss = 0
    epoch_csi = 0
    num_batches = 0
    for x, y, w in train.map(weight_func):
        y = tf.cast(y, dtype=tf.float32)
        loss, y_pred = train_step(x, y)
        epoch_loss += loss
        epoch_csi += metrics[0](y, y_pred)
        num_batches += 1
    
    epoch_loss /= num_batches
    epoch_csi /= num_batches

The gradients variable is always [None, None, None, ...] and the rest of the code fails. The code works with keras.losses.BinaryCrossentropy and other binary loss functions so as far as I can tell it can only be an issue with the custom_csi_loss function. I have checked the size and data types of y and y_pred and they are consistent.

This question resembles the following but they didn't solve my problem:

(Please help!)

How would you differentiate the `round()` function? Gradients are None because function is not differentiable. — Frightera, Mar 14 '23 at 22:32

GradientTape returning None with custom CSI loss function

0 Answers0