TF 2.3 using experimental_steps_per_execution in model.compile cause drop in model performance

Question

Using TPU, I have tried to pass experimental_steps_per_execution to model.compile(...), I do see a big speedup, but for the exact learning rate schedule, I noticed a 2-3% drop in accuracy when training is done. In summary, the only thing I changed is that parameter.

I have not yet located any detailed documentation concerning this parameter. While it seems to speed up training, I am unclear about the "algorithm" difference, esp. concerning how the gradients are computed and gradient descent steps are done.

Anyone knows more about this? Do i need to tune other things such as my learning rate or batch_size?

score 1 · Answer 1 · answered Dec 21 '20 at 20:03

experimental_steps_per_execution controls how many batches are processed on a device before a callback is run. If your learning rate scheduler is based on callbacks then it's possible that the learning rate is not being updated. Could you try setting experimental_steps_per_execution to a common multiple used in the learning rate scheduler?

For instance if the learning rate updates every 100 steps, then set experimental_steps_per_execution=100.

TF 2.3 using experimental_steps_per_execution in model.compile cause drop in model performance

1 Answers1