Residual neural network model runs very slowly on google colab tpu hardware?

Question

I've made a residual neural network model on Google Colab in keras for the cifar10 dataset, but it runs very slowly on TPU hardware.

I have another regular convolutional neural network that runs fine on google colab. This model uses the keras Sequential API and the residual neural network uses the Functional API, not sure if that is the issue. I've already tried changing the batch size and that did not help. The link to my program is below.

https://colab.research.google.com/github/valentinocc/Keras_cifar10/blob/master/keras_rnn_cifar10.ipynb#scrollTo=7Jc51Dbac2MC

Expect each epoch to finish in at least under one minute (usually around 10 seconds max) but it seems that each mini-batch takes a full minute on its own to complete (and there are many mini-batches per epoch).

I found a way to make some keras resnet demo code run fast on tpu, so I will work backwards from there and answer my own question when I figure the problem out. — valentinocc, Jun 16 '19 at 03:07

score 0 · Answer 1 · answered Jun 17 '19 at 00:30

It seems like the issue has to do with using different optimizers. Using tensorflow.keras.optimizers.Adam allowed the tpu to run properly whereas using tensorflow.train.AdamOptimizer ran very slowly.

However the problem becomes more difficult to fix when using the function fit_generator and ImageDataGenerator object for data augmentation (as opposed to just the "fit" function). The ImageDataGenerator, keras functional API, and TPU hardware do not seem to work well together. The tf.keras.optimizers.Adam will have runtime errors, and the tf.train.AdamOptimizer runs as fast as a CPU. I think the solution here is to use another framework with a GPU or to try out tensorflow without keras.

score 0 · Answer 2 · answered Jun 10 '20 at 20:10

It seams like your model isn't running on the TPU hardware but rather on the CPU. In order to run training/prediction on a TPU for a Tensorflow Keras model, you will create a TPUStrategy and compile your model within that strategy scope:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.experimental.TPUStrategy(resolver)

with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam',
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['sparse_categorical_accuracy'])

For more info please follow the tpu guide.

Residual neural network model runs very slowly on google colab tpu hardware?

2 Answers2