I have a deep CNN/RNN that I train on Google AI platform. I distribute the training on 8 GPUs using the tf.distribute.MirroredStrategy
. I recently upgraded my runtime version from 1.13 to 1.15 and my training is more than 2x slower than before. I read that tf.estimator.ProfilerHook
can be used to identify performance bottlenecks. So I collected the profiling information and rendered it at chrome://tracing
. I got this
A training step spends an entire 1 second on these _Send
ops. What is this? I can't find any documentation on the op or why it's in my graph. What does this mean?