When I trained model for several epochs and want to retrain it again for more epochs. How would Adam optimizer work. will it initialize the time from t =0 or will it save the last time step?
a) The documentation in tensorflow shows the following calculations. Is there a away I can add these metrics to tensorboard.
t <- t + 1
lr_t <- learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t)
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g
variable <- variable - lr_t * m_t / (sqrt(v_t) + epsilon)
there are no answers for a few questions since a long time question1 and question2.
I am actually getting a problem with error rate when re-training the model from the last checkpoint and I was not sure what exactly is happening with Adam optimizer in this case ?