39

I'd like print out the learning rate for each training step of my nn.

I know that Adam has an adaptive learning rate, but is there a way i can see this (for visualization in tensorboard)

kmace
  • 1,994
  • 3
  • 23
  • 39
  • 3
    By quickly reading the code, you can get tr: print sess.run(adam_op._lr_t), after having adam_op = tf.train.AdamOptimizer(0.1, beta1=0.5, beta2=0.5) , train_op = adam_op.minimize(cost). However, it's not sure its working in your code. Can you qickly test? – Sung Kim May 02 '16 at 22:19
  • 1
    Side note: The right way to think about adam is not as learning rate (scaling the gradients), but as a step size. The `learning_rate` you pass in is the maximum step size (per parameter), Adam takes steps up to that size, depending on how consistent the gradient is. – mdaoust Apr 23 '17 at 20:56
  • OK @mdaoust, but then how can I obtain the learning rate at each step? I tried Sung Kim suggestion but does not work, as it returns a flat line. Thanks. – Escachator Apr 29 '17 at 15:13

6 Answers6

25

All the optimizers have a private variable that holds the value of a learning rate.

In adagrad and gradient descent it is called self._learning_rate. In adam it is self._lr.

So you will just need to print sess.run(optimizer._lr) to get this value. Sess.run is needed because they are tensors.

o-L-o
  • 3
  • 4
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • 11
    Note in tf2.x, with a learning rate schedule in use, you should use: (for ADAM) `lr = optimizer._decayed_lr(tf.float32)` – Gouda Nov 04 '19 at 06:19
12

Sung Kim suggestion worked for me, my exact steps were:

lr = 0.1
step_rate = 1000
decay = 0.95

global_step = tf.Variable(0, trainable=False)
increment_global_step = tf.assign(global_step, global_step + 1)
learning_rate = tf.train.exponential_decay(lr, global_step, step_rate, decay, staircase=True)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.01)
trainer = optimizer.minimize(loss_function)

# Some code here

print('Learning rate: %f' % (sess.run(trainer ._lr)))
X. Serra
  • 121
  • 1
  • 4
  • 5
    I use `GradientDescentOptimizer` and use `self._learning_rate`. It does not work for me. I get error `AttributeError: 'Operation' object has no attribute '_learning_rate'` – ARAT May 27 '18 at 04:01
4

I think the easiest thing you can do is subclass the optimizer.

It has several methods, that I guess get dispatched to based on variable type. Regular Dense variables seem to go through _apply_dense. This solution won't work for sparse or other things.

If you look at the implementation you can see that it's storing the m and t EMAs in these "slots". So something like this seems do it:

class MyAdam(tf.train.AdamOptimizer):
    def _apply_dense(self, grad, var):
        m = self.get_slot(var, "m")
        v = self.get_slot(var, "v")

        m_hat = m/(1-self._beta1_power)
        v_hat = v/(1-self._beta2_power)

        step = m_hat/(v_hat**0.5 + self._epsilon_t)

        # Use a histogram summary to monitor it during training.
        tf.summary.histogram("hist", step) 

        return super(MyAdam,self)._apply_dense(grad, var)

step here will be in the interval [-1,1], that's what gets multiplied by the learning rate, to determines the actual step applied to the parameters.

There's often no node in the graph for it because there is one big training_ops.apply_adam that does everything.

Here I'm just creating a histogram summary from it. But you could stick it in a dictionary attached to the object and read it later or do whatever you want with it.

Droping that into mnist_deep.py, and adding some summaries to the training loop:

all_summaries = tf.summary.merge_all()  
file_writer = tf.summary.FileWriter("/tmp/Adam")
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(20000):
        batch = mnist.train.next_batch(50)
        if i % 100 == 0:
            train_accuracy,summaries = sess.run(
                [accuracy,all_summaries],
                feed_dict={x: batch[0], y_: batch[1], 
                           keep_prob: 1.0})
            file_writer.add_summary(summaries, i)
            print('step %d, training accuracy %g' % (i, train_accuracy))
       train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

Produces the following figure in TensorBoard:

TensorBoard Histogram tab, showing 8 different histogram plots

mdaoust
  • 6,242
  • 3
  • 28
  • 29
  • Due to resource constraint I have to explicitly place my network onto different GPUs, and this subclassing hack gives me an error: `Could not satisfy explicit device specification '/device:GPU:1' because no supported kernel for GPU devices is available`. Removing the `tf.summary.histogram` line will remove the complaint. – Ziyuan Jan 24 '18 at 15:04
  • Later versions of Tensorflow have a slightly different way of accessing the beta variables. – Kevin Jan 28 '20 at 23:35
4

In Tensorflow 2:

optimizer = tf.keras.optimizers.Adagrad(learning_rate=0.1)  # or any other optimizer
print(optimizer.learning_rate.numpy())  # or print(optimizer.lr.numpy())

Note: This gives you the base learning rate. Refer to this answer for more details on adaptive learning rates.

Ali Salehi
  • 1,003
  • 8
  • 19
  • When I call ````print(model.optimizer.learning_rate)```` as you suggested, I am getting ````````. If I add ````numpy()````. then I receive ````'ExponentialDecay' object has no attribute 'numpy'```` – whitepanda Sep 26 '21 at 20:10
  • They might've checked the object signature. You might wanna take a look at it's definition on it's source codes. – Ali Salehi Sep 26 '21 at 22:46
  • If you use a scheduler like `tf.keras.optimizers.schedules.ExponentialDecay`, [call this object with the current training step as argument](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/LearningRateSchedule). – gizzmole Jul 05 '22 at 07:46
3

In TensorFlow sources current lr for Adam optimizer calculates like:

    lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))

So, try it:

    current_lr = (optimizer._lr_t * tf.sqrt(1 - 
    optimizer._beta2_power) / (1 - optimizer._beta1_power))

    eval_current_lr = sess.run(current_lr)
0

For Tensorflow 2 using tf.keras.optimizers.schedules.LearningRateSchedule inspired by this comment:

lr_schedule = tf.keras.optimizers.schedules.CosineDecay(learning_rate, total_steps)
optimizer = tf.keras.optimizers.Adam(lr_schedule)
print(optimizer.lr(optimizer.iterations))
gizzmole
  • 1,437
  • 18
  • 26