2

Using TPU, I have tried to pass experimental_steps_per_execution to model.compile(...), I do see a big speedup, but for the exact learning rate schedule, I noticed a 2-3% drop in accuracy when training is done. In summary, the only thing I changed is that parameter.

I have not yet located any detailed documentation concerning this parameter. While it seems to speed up training, I am unclear about the "algorithm" difference, esp. concerning how the gradients are computed and gradient descent steps are done.

Anyone knows more about this? Do i need to tune other things such as my learning rate or batch_size?

kawingkelvin
  • 3,649
  • 2
  • 30
  • 50

1 Answers1

1

experimental_steps_per_execution controls how many batches are processed on a device before a callback is run. If your learning rate scheduler is based on callbacks then it's possible that the learning rate is not being updated. Could you try setting experimental_steps_per_execution to a common multiple used in the learning rate scheduler?

For instance if the learning rate updates every 100 steps, then set experimental_steps_per_execution=100.

Allen Wang
  • 281
  • 1
  • 4