Nan losses using "Learning Rate Step Decay" Scheduler with Adam Optimizer in Keras?

Question

I have this very deep model:

def get_model2(mask_kind):

decay = 0.0

    inp_1 = keras.Input(shape=(64, 101, 1), name="RST_inputs")
    x = layers.Conv2D(256, kernel_size=(3, 3), kernel_regularizer=l2(1e-6), strides=(3, 3), padding="same")(inp_1)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(128, kernel_size=(3, 3), kernel_regularizer=l2(1e-6), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(64, kernel_size=(2, 2), kernel_regularizer=l2(1e-6), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2D(32, kernel_size=(2, 2), kernel_regularizer=l2(1e-6), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(512)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(256)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    out1 = layers.Dense(128, name="ls_weights")(x)

    if mask_kind == 1:  # APPLICA LA PRIMA MASCHERA
        binary_mask = layers.Lambda(mask_layer1, name="lambda_layer1", dtype='float64')(out1)
        print('shape', binary_mask.shape[0])
    elif mask_kind == 2:  # APPLICA LA SECONDA MASCHERA
        binary_mask = layers.Lambda(mask_layer2, name="lambda_layer2", dtype='float64')(out1)
    else:  # NON APPLICA NULLA
        binary_mask = out1

    x = layers.Dense(256)(binary_mask)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(512)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Dense(192)(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Reshape((2, 2, 48))(x)
    x = layers.Conv2DTranspose(32, kernel_size=(2, 2), strides=(2, 2), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(64, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(128, kernel_size=(3, 3), strides=(3, 3), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    x = layers.Conv2DTranspose(256, kernel_size=(3, 3), strides=(5, 5), padding="same")(x)
    x = layers.LeakyReLU(alpha=0.3)(x)
    soundfield_layer = layers.Conv2DTranspose(1, kernel_size=(1, 1), strides=(1, 1), padding='same')(x)
    # soundfield_layer = layers.Dense(40000, name="sf_vec")(x)

    if mask_kind == 1:
        model = keras.Model(inp_1, [binary_mask, soundfield_layer], name="2_out_model")
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1, decay=decay),  # in caso
                      # rimettere 0.001
                      loss=["mse", "mse"], loss_weights=[1, 1])
        # plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
        model.summary()

    else:
        model = keras.Model(inp_1, [binary_mask, soundfield_layer], name="2_out_model")
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1, decay=decay),  # in caso
                      # rimettere 0.001
                      loss=["mse", "mse"], loss_weights=[0, 1])
        # plot_model(model, to_file='model.png', show_shapes=True, show_layer_names=True)
        model.summary()

    return model

and I'm trying to use Learning rate Step Decay to see if I can improve my validation loss function during training. I'm defining the class for the scheduler as follows:

class StepDecay:
    def __init__(self, initAlpha=0.1, factor=0.25, dropEvery=30):
        # store the base initial learning rate, drop factor, and
        # epochs to drop every
        self.initAlpha = initAlpha
        self.factor = factor
        self.dropEvery = dropEvery
    
    def __call__(self, epoch):
        # compute the learning rate for the current epoch
        exp = np.floor((1 + epoch) / self.dropEvery)
        alpha = self.initAlpha * (self.factor ** exp)
        # return the learning rate
        return float(alpha)

and then I run my training:

schedule = StepDecay(initAlpha=1e-1, factor=0.25, dropEvery=30)
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=50)
callbacks = [es, LearningRateScheduler(schedule)]

model = get_model2(mask_kind=1)

history = model.fit(X_train, [Y_train, Z_train], validation_data=(X_val, [Y_val, Z_val]), epochs=300,
                    batch_size=32,
                    callbacks=callbacks, verbose=1)

test_loss, _, _ = model.evaluate(X_test, [Y_test, Z_test], verbose=1)
print('Test: %.3f' % test_loss)

but when I train I get "nan" losses:

25/25 [==============================] - 17s 684ms/step - loss: nan - lambda_layer1_loss: nan - conv2d_transpose_4_loss: nan - val_loss: nan - val_lambda_layer1_loss: nan etc....

and I don't understand why. The problem could be the decay rate which is a parameter present in the SGD optimizer but that from the documentation does not exists for Adam, but I get no error that so..any ideas?

It has something to do with your callbacks, can you post the code for the LearningRateScheduler(schedule) function you have? — BoomBoxBoy, Jan 27 '21 at 16:33
the function is in the class, it is the __call__ function that is called — Ciccios_1518, Jan 27 '21 at 17:18
Anyway, I tried to use the SGD optimizer and using that I have no problem, even if my loss seems not to decrease at all after 300 epochs, but I'd like to know if I can use these callback with Adam — Ciccios_1518, Jan 27 '21 at 17:20
I think thats where the problem is. If you are using the code above, the model will be updating your learning rate based on the LearningRateScheduler(schedule) function instead. If you remove the LearningRateScheduler(schedule) from the callback list that might work? — BoomBoxBoy, Jan 27 '21 at 17:29
And that's what I want..I want that the learning rate gets updated according to the LearningRateScheduler(schedule) — Ciccios_1518, Jan 27 '21 at 17:36

score 1 · Accepted Answer · answered Jan 27 '21 at 18:20

1

You can play with the parameters to find a good balance, but this is one way to use exponential decay as a callback function with the Adam optimizer.

LR_MAX = 0.0001
LR_MIN = 0.00001
LR_EXP_DECAY = 0.85

def lrfn(epoch):
    lr = (LR_MAX - LR_MIN) * LR_EXP_DECAY**(epoch) + LR_MIN
    return lr

    
lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = True)

Simply define the callback like in the following example.

model.fit(..
          ..
          callbacks = [lr_callback],
          ..
          ..)

answered Jan 27 '21 at 18:20

BoomBoxBoy

1,770
1
5
23

And I need to set an initial learning rate when I set the optimizer in the model.compile() function right? – Ciccios_1518 Jan 27 '21 at 18:43
1

Yep, just set it so it equals LR_MAX and you will be good to go! – BoomBoxBoy Jan 28 '21 at 04:12

Nan losses using "Learning Rate Step Decay" Scheduler with Adam Optimizer in Keras?

1 Answers1