How can I print the Learning Rate at each epoch with Adam optimizer in Keras?

Question

Because online learning does not work well with Keras when you are using an adaptive optimizer (the learning rate schedule resets when calling .fit()), I want to see if I can just manually set it. However, in order to do that, I need to find out what the learning rate was at the last epoch.

That said, how can I print the learning rate at each epoch? I think I can do it through a callback but it seems that you have to recalculate it each time and I'm not sure how to do that with Adam.

I found this in another thread but it only works with SGD:

class SGDLearningRateTracker(Callback):
    def on_epoch_end(self, epoch, logs={}):
        optimizer = self.model.optimizer
        lr = K.eval(optimizer.lr * (1. / (1. + optimizer.decay * optimizer.iterations)))
        print('\nLR: {:.6f}\n'.format(lr))

Your question doesn't have an answer. Adam does not have a single learning rate. — Ricardo Magalhães Cruz, Apr 03 '18 at 16:48
I have been using CSVLogger to log my metrics and noticed that it already records the learning rate. There was no need to create a custom metric for this purpose. — Daniel Ferber, Dec 03 '20 at 02:41

Andrey · Answer 1 · 2020-11-29T08:57:55.560

12

I am using the following approach, which is based on @jorijnsmit answer:

def get_lr_metric(optimizer):
    def lr(y_true, y_pred):
        return optimizer._decayed_lr(tf.float32) # I use ._decayed_lr method instead of .lr
    return lr

optimizer = keras.optimizers.Adam()
lr_metric = get_lr_metric(optimizer)

model.compile(
    optimizer=optimizer,
    metrics=['accuracy', lr_metric],
    loss='mean_absolute_error', 
    )

It works with Adam.

edited Nov 29 '20 at 08:57

answered Nov 29 '20 at 08:45

Andrey

5,932
3
17
35

This one worked for me, the original didn't when using with a TPU in Kaggle. – Manuel Dec 14 '20 at 20:53
I've tried, but this does not output the learning rate at each epoque but at each batch iteration – Luca Foppiano Jun 15 '23 at 01:28

score 6 · Answer 2 · answered Dec 15 '19 at 20:26

6

I found this question very helpful. A minimal workable example that answers your question would be:

def get_lr_metric(optimizer):
    def lr(y_true, y_pred):
        return optimizer.lr
    return lr

optimizer = keras.optimizers.Adam()
lr_metric = get_lr_metric(optimizer)

model.compile(
    optimizer=optimizer,
    metrics=['accuracy', lr_metric],
    loss='mean_absolute_error', 
    )

answered Dec 15 '19 at 20:26

gosuto

5,422
6
36
57

This does answer seems copied by the one [before](https://stackoverflow.com/a/65058380/4886772) not work because the returned lr must be a tensor. – Luca Foppiano Jun 15 '23 at 01:29
@LucaFoppiano before? my answer dates from 2019, versus 2020 for the one you link. – gosuto Jun 24 '23 at 04:29
You're right. I did not see the date. Sorry about that. But perhaps maybe you could add the version of tensorflow you've been using at the time. – Luca Foppiano Jun 25 '23 at 22:59

score 5 · Answer 3 · answered Mar 16 '21 at 12:09

For everyone that is still confused on this topic:

The solution from @Andrey works but only if you set a decay to your learning rate, you have to schedule the learning rate to lower itself after 'n' epoch, otherwise it will always print the same number (the starting learning rate), this is because that number DOES NOT change during training, you can't see how the learning rates adapts, because every parameter in Adam has a different learning rate that adapts itself during the training, but the variable lr NEVER changes

score 1 · Answer 4 · answered Nov 27 '17 at 09:26

1

class MyCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        lr = self.model.optimizer.lr
        # If you want to apply decay.
        decay = self.model.optimizer.decay
        iterations = self.model.optimizer.iterations
        lr_with_decay = lr / (1. + decay * K.cast(iterations, K.dtype(decay)))
        print(K.eval(lr_with_decay))

Follow this thread.

answered Nov 27 '17 at 09:26

Tushar Gupta

1,603
13
20

2

That's not the learning rate used by Adam. That's SGD with decay. – Ricardo Magalhães Cruz Apr 03 '18 at 16:37

score 1 · Answer 5 · answered Aug 06 '18 at 06:52

This piece of code might help you. It is based on Keras implementation of Adam optimizer (beta values are Keras defaults)

from keras import Callback
from keras import backend as K
class AdamLearningRateTracker(Callback):
    def on_epoch_end(self, logs={}):
        beta_1=0.9, beta_2=0.999
        optimizer = self.model.optimizer
        if optimizer.decay>0:
            lr = K.eval(optimizer.lr * (1. / (1. + optimizer.decay * optimizer.iterations)))
        t = K.cast(optimizer.iterations, K.floatx()) + 1
        lr_t = lr * (K.sqrt(1. - K.pow(beta_2, t)) /(1. - K.pow(beta_1, t)))
        print('\nLR: {:.6f}\n'.format(lr_t))

Your snippet is a good baseline, but it has multiple errors (at least for Keras 2.2.4 with Tensorflow 1.13.1): 1. The function signature should be `def on_epoch_end(self, epoch, logs=None):` 2. `optimizer.decay` is a tensor and cannot be evaluated as bool, needs a `K.eval` 3. `lr` is not defined if `decay` is 0 4. `lr_t` is a tensor and cannot be used in a format string, needs a `K.eval` as well — Arno Hilke, May 28 '19 at 13:46

How can I print the Learning Rate at each epoch with Adam optimizer in Keras?

5 Answers5

Linked