9

Are there any functions or methods which can show the learning rate when I use the tensorflow 2.0 custom training loop?

Here is an example of tensorflow guide:

def train_step(images, labels):
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = loss_object(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

  train_loss(loss)
  train_accuracy(labels, predictions)

How can I retrieve the current learning rate from the optimizer when the model is training?

I will be grateful for any help you can provide. :)

Kevin
  • 16,549
  • 8
  • 60
  • 74
yun
  • 93
  • 1
  • 4

3 Answers3

8

In Tensorflow 2.1, the Optimizer class has an undocumented method _decayed_lr (see definition here), which you can invoke in the training loop by supplying the variable type to cast to:

current_learning_rate = optimizer._decayed_lr(tf.float32)

Here's a more complete example with TensorBoard too.

train_step_count = 0
summary_writer = tf.summary.create_file_writer('logs/')
def train_step(images, labels):
  train_step_count += 1
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = loss_object(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))

  # optimizer._decayed_lr(tf.float32) is the current Learning Rate.
  # You can save it to TensorBoard like so:
  with summary_writer.as_default():
    tf.summary.scalar('learning_rate',
                      optimizer._decayed_lr(tf.float32),
                      step=train_step_count)
P Shved
  • 96,026
  • 17
  • 121
  • 165
7

In custom training loop setting, you can print(optimizer.lr.numpy()) to get the learning rate.

If you are using keras api, you can define your own callback that records the current learning rate.

from tensorflow.keras.callbacks import Callback

class LRRecorder(Callback):
    """Record current learning rate. """
    def on_epoch_begin(self, epoch, logs=None):
        lr = self.model.optimizer.lr
        print("The current learning rate is {}".format(lr.numpy()))

# your other callbacks 
callbacks.append(LRRecorder())

Update

w := w - (base_lr*m/sqrt(v))*grad = w - act_lr*grad The learning rate we get above is the base_lr. However, act_lr is adaptive changed during training. Take Adam optimizer as an example, act_lr is determined by base_lr, m and v. m and v are the first and second momentums of parameters. Different parameters have different m and v values. So if you would like to know the act_lr, you need to know the variable's name. For example, you want to know the act_lr of the variable Adam/dense/kernel, you can access the m and v like this,

for var in optimizer.variables():
  if 'Adam/dense/kernel/m' in var.name:
    print(var.name, var.numpy())

  if 'Adam/dense/kernel/v' in var.name:
    print(var.name, var.numpy())

Then you can easily calculate the act_lr using above formula.

zihaozhihao
  • 4,197
  • 2
  • 15
  • 25
  • I understand I can get the learning rate by print(optimizer.lr.numpy()). But if I use Adam or other adaptive optimizer, the learning rate should be changed in training progress. however, when I checked the value of optimizer.lr, the learning rate did not changed. – yun Sep 29 '19 at 03:15
  • @yun in this way you get the constant term of the learning rate. I think you have already found out this detail. Did you figure out how to have access to the value of the effective learning rate? – Siderius Dec 05 '21 at 16:01
0

I had the same question but I think that this question has a not well-posed aim. We know that Adam calculates the learning rate in terms of the past gradients of the loss function with respect to the weight considered.

So let us suppose that a function, whose output is the adaptive learning rate of Adam, exists; then we would get as many learning rates values as neural network weights.

In fact, following the procedure suggested by zihaozhihao:

for var in actor_optimizer.variables():
  if 'Adam/dense/kernel/m' in var.name:
    print(var.name, len(var.numpy()), len(var.numpy()[0]))

  if 'Adam/dense/kernel/v' in var.name:
    print(var.name, len(var.numpy()), len(var.numpy()[0]))

you don't get the length of the object equal to one but something depending on the neural network architecture.

On the other hand, optimizers like SGD uses the same lr for each weight so in that case you can uniquely define a lr.

Siderius
  • 174
  • 2
  • 14