0

I am currently training my model with the TPU. Unfortunately, I get an X error when using TensoBoard and the TPU. If I only use the TPU then everything works. If I use GPU and TensorBoard, everything works too. I use google colab.

%tensorflow_version 2.x
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
TPU_ACTIVATED = True

if TPU_ACTIVATED:
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
  except ValueError:
    raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

  tf.config.experimental_connect_to_cluster(tpu)
  tf.tpu.experimental.initialize_tpu_system(tpu)
  tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
.
.
.

SAVEPATH = "/content/drive/My Drive/checker/saved"
logdir = Path(SAVEPATH, "logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
.
.
.
CALLBACKS = [tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)]
.
.
.
hist = model.fit(...,callbacks=CALLBACKS) # here is the error

Error:

UnimplementedError: File system scheme '[local]' not implemented (file: '/content/drive/My Drive/Dataset/checker/saved/logs/20201117-134736/train')
    Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.

Why does this error occur and how do I fix it?

1 Answers1

0

TPU can only read/write directly from GCS directories. Could you use a GCS bucket and use that as a logdir? ex. logdir='gs://GCS_BUCKET_NAME/...'

Also, make sure your TPU has proper write access to your GCS bucket: https://cloud.google.com/tpu/docs/storage-buckets#storage_access

jysohn
  • 871
  • 6
  • 9