I would like to keep track of the learning rate while training with an Estimator
(a TPUEstimator
on a TPU, of all things). I am experimenting with the Colab MNIST example. I figured I would create a training hook, which would log the learning rate. Here is my code:
class TrainHook(tf.train.SessionRunHook):
def __init__(self, optimizer):
self.optimizer = optimizer
def after_create_session(self, session, coord):
self.session = session
def before_run(self, run_context):
# optimizer is a CrossShardOptimizer (see notebook), hence ._opt
logger.info('Learning rate is {}'.format(
self.optimizer._opt._lr.eval(session=self.session))
optimizer
was created like this in model_fn
(copied from the notebook):
step = tf.train.get_or_create_global_step()
lr = 0.0001 + tf.train.exponential_decay(0.01, step, 2000//8, 1/math.e)
optimizer = tf.train.AdamOptimizer(lr)
if params['use_tpu']:
optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
Unfortunately, when I run this code, I get the following error: "Error recorded from training_loop: Operation 'add_1' has been marked as not fetchable." Apparently _opt._lr is an add_1 operation, because of exponential_decay
. At least on the TPU; when I install regular Tensorflow on my laptop, it's an add
operator; could this be the difference?
I know that even if I could get it, _lr
is not the current, but the base learning rate. But it's a start. :)
Note: _lr
and _lr_t
behaves identically in that both result in the error above.
I use tensorflow v1.14.