Getting the current learning rate from a tf.train.AdamOptimizer

Question

I'd like print out the learning rate for each training step of my nn.

I know that Adam has an adaptive learning rate, but is there a way i can see this (for visualization in tensorboard)

By quickly reading the code, you can get tr: print sess.run(adam_op._lr_t), after having adam_op = tf.train.AdamOptimizer(0.1, beta1=0.5, beta2=0.5) , train_op = adam_op.minimize(cost). However, it's not sure its working in your code. Can you qickly test? — Sung Kim, May 02 '16 at 22:19
Side note: The right way to think about adam is not as learning rate (scaling the gradients), but as a step size. The `learning_rate` you pass in is the maximum step size (per parameter), Adam takes steps up to that size, depending on how consistent the gradient is. — mdaoust, Apr 23 '17 at 20:56
OK @mdaoust, but then how can I obtain the learning rate at each step? I tried Sung Kim suggestion but does not work, as it returns a flat line. Thanks. — Escachator, Apr 29 '17 at 15:13

score 25 · Accepted Answer · edited Dec 11 '19 at 04:05

25

All the optimizers have a private variable that holds the value of a learning rate.

In adagrad and gradient descent it is called self._learning_rate. In adam it is self._lr.

So you will just need to print sess.run(optimizer._lr) to get this value. Sess.run is needed because they are tensors.

edited Dec 11 '19 at 04:05

o-L-o

3
4

answered May 13 '17 at 12:38

Salvador Dali

214,103
147
703
753

11

Note in tf2.x, with a learning rate schedule in use, you should use: (for ADAM) `lr = optimizer._decayed_lr(tf.float32)` – Gouda Nov 04 '19 at 06:19

score 12 · Answer 2 · answered May 11 '17 at 10:00

Sung Kim suggestion worked for me, my exact steps were:

lr = 0.1
step_rate = 1000
decay = 0.95

global_step = tf.Variable(0, trainable=False)
increment_global_step = tf.assign(global_step, global_step + 1)
learning_rate = tf.train.exponential_decay(lr, global_step, step_rate, decay, staircase=True)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.01)
trainer = optimizer.minimize(loss_function)

# Some code here

print('Learning rate: %f' % (sess.run(trainer ._lr)))

I use `GradientDescentOptimizer` and use `self._learning_rate`. It does not work for me. I get error `AttributeError: 'Operation' object has no attribute '_learning_rate'` — ARAT, May 27 '18 at 04:01

score 4 · Answer 3 · answered Jun 22 '17 at 00:40

I think the easiest thing you can do is subclass the optimizer.

It has several methods, that I guess get dispatched to based on variable type. Regular Dense variables seem to go through _apply_dense. This solution won't work for sparse or other things.

If you look at the implementation you can see that it's storing the m and t EMAs in these "slots". So something like this seems do it:

class MyAdam(tf.train.AdamOptimizer):
    def _apply_dense(self, grad, var):
        m = self.get_slot(var, "m")
        v = self.get_slot(var, "v")

        m_hat = m/(1-self._beta1_power)
        v_hat = v/(1-self._beta2_power)

        step = m_hat/(v_hat**0.5 + self._epsilon_t)

        # Use a histogram summary to monitor it during training.
        tf.summary.histogram("hist", step) 

        return super(MyAdam,self)._apply_dense(grad, var)

step here will be in the interval [-1,1], that's what gets multiplied by the learning rate, to determines the actual step applied to the parameters.

There's often no node in the graph for it because there is one big training_ops.apply_adam that does everything.

Here I'm just creating a histogram summary from it. But you could stick it in a dictionary attached to the object and read it later or do whatever you want with it.

Droping that into mnist_deep.py, and adding some summaries to the training loop:

all_summaries = tf.summary.merge_all()  
file_writer = tf.summary.FileWriter("/tmp/Adam")
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(20000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy,summaries = sess.run(
                [accuracy,all_summaries],
                feed_dict={x: batch[0], y_: batch[1], 
                           keep_prob: 1.0})
            file_writer.add_summary(summaries, i)
            print('step %d, training accuracy %g' % (i, train_accuracy))
       train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

Produces the following figure in TensorBoard:

Due to resource constraint I have to explicitly place my network onto different GPUs, and this subclassing hack gives me an error: `Could not satisfy explicit device specification '/device:GPU:1' because no supported kernel for GPU devices is available`. Removing the `tf.summary.histogram` line will remove the complaint. — Ziyuan, Jan 24 '18 at 15:04
Later versions of Tensorflow have a slightly different way of accessing the beta variables. — Kevin, Jan 28 '20 at 23:35

Ali Salehi · Answer 4 · 2020-01-22T17:00:11.910

4

In Tensorflow 2:

optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.1)  # or any other optimizer
print(optimizer.learning_rate.numpy())  # or print(optimizer.lr.numpy())

Note: This gives you the base learning rate. Refer to this answer for more details on adaptive learning rates.

edited Jan 22 '20 at 17:00

answered Jan 22 '20 at 16:46

Ali Salehi

1,003
8
19

When I call ````print(model.optimizer.learning_rate)```` as you suggested, I am getting ````````. If I add ````numpy()````. then I receive ````'ExponentialDecay' object has no attribute 'numpy'```` – whitepanda Sep 26 '21 at 20:10
They might've checked the object signature. You might wanna take a look at it's definition on it's source codes. – Ali Salehi Sep 26 '21 at 22:46
If you use a scheduler like `tf.keras.optimizers.schedules.ExponentialDecay`, [call this object with the current training step as argument](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/LearningRateSchedule). – gizzmole Jul 05 '22 at 07:46

Vladislav Dusyak · Answer 5 · 2019-05-24T12:10:27.130

3

In TensorFlow sources current lr for Adam optimizer calculates like:

    lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))

So, try it:

    current_lr = (optimizer._lr_t * tf.sqrt(1 - 
    optimizer._beta2_power) / (1 - optimizer._beta1_power))

    eval_current_lr = sess.run(current_lr)

edited May 24 '19 at 12:10

answered Mar 13 '18 at 17:08

Vladislav Dusyak

51
4

in the code your beta2_power and beta1_power seems to be switched , compared to the tf sources you wrote above – omer schleifer May 22 '19 at 08:03

score 0 · Answer 6 · answered Jul 05 '22 at 07:54

For Tensorflow 2 using tf.keras.optimizers.schedules.LearningRateSchedule inspired by this comment:

lr_schedule = tf.keras.optimizers.schedules.CosineDecay(learning_rate, total_steps)
optimizer = tf.keras.optimizers.Adam(lr_schedule)
print(optimizer.lr(optimizer.iterations))

Getting the current learning rate from a tf.train.AdamOptimizer

6 Answers6

Linked