1

I have setup a TPU machine on Google Cloud and I think I have done it properly because when I run ctpu status it returns RUNNING.

However, I have a Python script that I am trying to run and I want it to use TPU. It is still using CPU though, according to the first few lines of the output in the terminal. The output is

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7191847218438877393
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 6058516396679200559
physical_device_desc: "device: XLA_CPU device"
]

The command I am running is: python3 test.py 1 --tpu-name=$TPU_NAME

I already ran the export TPU_NAME=tpu_vm1 and confirmed it with echo

So what could I be doing wrong? How could i have the script use TPU instead?

Just in case, here is a redacted except from my test.py script:

#
# resnet time-to-accuracy-improvement tests
#

import os
from numpy.random import seed
seed(1)
import tensorflow as tf
tf.random.set_seed(2)
import numpy
import time

import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
import mycallbacks

from tensorflow.keras.applications.inception_v3 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.layers import Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import plot_model

...

# display device type
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Tendi
  • 1,343
  • 2
  • 10
  • 11

1 Answers1

0

So you're listing the local CPU device on the GCE VM you're running the coordinator on. The TPU itself consists of remote devices that run on the TPU host and is not a local device.

Please check this Colab notebook out. When you run something like:

tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

It'll spit out the TPUSystemMetadata like:

INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
...

To actually have the model placed on TPU devices just make sure to use the TPU strategy as described here.

jysohn
  • 871
  • 6
  • 9