Google Cloud VM still using CPU instead of TPU to execture Python/Tensorflow script

Question

I have setup a TPU machine on Google Cloud and I think I have done it properly because when I run ctpu status it returns RUNNING.

However, I have a Python script that I am trying to run and I want it to use TPU. It is still using CPU though, according to the first few lines of the output in the terminal. The output is

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7191847218438877393
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 6058516396679200559
physical_device_desc: "device: XLA_CPU device"
]

The command I am running is: python3 test.py 1 --tpu-name=$TPU_NAME

I already ran the export TPU_NAME=tpu_vm1 and confirmed it with echo

So what could I be doing wrong? How could i have the script use TPU instead?

Just in case, here is a redacted except from my test.py script:

#
# resnet time-to-accuracy-improvement tests
#

import os
from numpy.random import seed
seed(1)
import tensorflow as tf
tf.random.set_seed(2)
import numpy
import time

import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
import mycallbacks

from tensorflow.keras.applications.inception_v3 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.layers import Input
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import plot_model

...

# display device type
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

score 0 · Answer 1 · answered Jun 05 '20 at 18:19

So you're listing the local CPU device on the GCE VM you're running the coordinator on. The TPU itself consists of remote devices that run on the TPU host and is not a local device.

Please check this Colab notebook out. When you run something like:

tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

It'll spit out the TPUSystemMetadata like:

INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
...

To actually have the model placed on TPU devices just make sure to use the TPU strategy as described here.

Google Cloud VM still using CPU instead of TPU to execture Python/Tensorflow script

1 Answers1