1

I would like to keep track of the learning rate while training with an Estimator (a TPUEstimator on a TPU, of all things). I am experimenting with the Colab MNIST example. I figured I would create a training hook, which would log the learning rate. Here is my code:

class TrainHook(tf.train.SessionRunHook):
    def __init__(self, optimizer):
        self.optimizer = optimizer

    def after_create_session(self, session, coord):
        self.session = session

    def before_run(self, run_context):
        # optimizer is a CrossShardOptimizer (see notebook), hence ._opt
        logger.info('Learning rate is {}'.format(
            self.optimizer._opt._lr.eval(session=self.session))

optimizer was created like this in model_fn (copied from the notebook):

step = tf.train.get_or_create_global_step()
lr = 0.0001 + tf.train.exponential_decay(0.01, step, 2000//8, 1/math.e)
optimizer = tf.train.AdamOptimizer(lr)
if params['use_tpu']:
    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)

Unfortunately, when I run this code, I get the following error: "Error recorded from training_loop: Operation 'add_1' has been marked as not fetchable." Apparently _opt._lr is an add_1 operation, because of exponential_decay. At least on the TPU; when I install regular Tensorflow on my laptop, it's an add operator; could this be the difference?

I know that even if I could get it, _lr is not the current, but the base learning rate. But it's a start. :)

Note: _lr and _lr_t behaves identically in that both result in the error above.

I use tensorflow v1.14.

David Nemeskey
  • 640
  • 1
  • 5
  • 16
  • Do you get the same error when you use `optimizer.variables()`? see: https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/AdamOptimizer#variables – Tyler Feb 18 '20 at 06:13
  • @Tyler No, that works, and I can run all the tensors therein. However, the learning rate is not among the variables, only the betas... (btw. sorry for the late answer) – David Nemeskey Mar 31 '20 at 12:12
  • @DavidNemeskey hi did you get the answer? I have the same question, please help. Thanks in advance. – user1024 Apr 25 '22 at 13:10
  • @user1024 I haven't got any answers, but it doesn't really matter as I have long since switched to PyTorch. So I am sorry, but I cannot really help. – David Nemeskey Apr 26 '22 at 15:18

0 Answers0