4

I am trying to train a model on Google Colab, in order to play around with training on TPU. However, I am running into the following error:

AttributeError                            Traceback (most recent call last)

<ipython-input-82-e74efc36d872> in <module>()
----> 1 tpu_model.fit_generator(training_set, steps_per_epoch = 8000, epochs = 25)

2 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/callbacks.py in configure_callbacks(callbacks, model, do_validation, batch_size, epochs, steps_per_epoch, samples, verbose, count_mode, mode)
    118   callback_list.model.stop_training = False
    119   # pylint: disable=protected-access
--> 120   if callback_list.model._ckpt_saved_epoch is not None:
    121     # The attribute `_ckpt_saved_epoch` is supposed to be None at the start of
    122     # training (it should be made None at the end of successful multi-worker

AttributeError: 'KerasTPUModel' object has no attribute '_ckpt_saved_epoch'

While trying to run the following code.

    import tensorflow as tf
    from tensorflow.keras import layers
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    import os
    import zipfile
    print(tf.VERSION)

    local_zip = '/home/cats_and_dogs_filtered.zip'
    zip_ref = zipfile.ZipFile(local_zip, 'r')
    zip_ref.extractall('/home')
    zip_ref.close()

    def create_model():
        classifier = tf.keras.models.Sequential()

        classifier.add(layers.Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
        classifier.add(layers.MaxPooling2D(pool_size=(2, 2)))

        classifier.add(layers.Conv2D(32, (3, 3), activation= 'relu'))
        classifier.add(layers.MaxPooling2D(pool_size=(2, 2)))

        classifier.add(layers.Flatten())

        classifier.add(layers.Dense(units=128, activation= 'relu'))
        classifier.add(layers.Dense(units=1, activation= 'sigmoid'))

        return classifier

    train_datagen = ImageDataGenerator(rescale = 1./255, shear_range = 0.2, zoom_range = 0.2, horizontal_flip = True)
    training_set = train_datagen.flow_from_directory('/home/cats_and_dogs_filtered/train', target_size = (64, 64), batch_size = 32, class_mode = 'binary')

    model = create_model()

    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(    model,    strategy=tf.contrib.tpu.TPUDistributionStrategy(tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
    tpu_model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
    tpu_model.save_weights('./tpu_model.h5', overwrite=True)      


    tpu_model.fit_generator(training_set, steps_per_epoch = 8000, epochs = 25)

I am not sure what is going on. I used similar code to train it on CPU (takes a long time to train).

SilverTear
  • 695
  • 7
  • 18

0 Answers0