I don't understand if the MirroredStrategy
has any impact on training outcome.
By that, I mean: Is the model trained on a single device the same as a model trained on multiple devices?
I think it should be the same model, because it's just a distributed calculation of the gradients, isn't it?