2

I'm trying to use my (pre-emptible) Cloud TPU v3-256 on my Google Cloud Compute Engine VM with TensorFlow 2.1, but it doesn't seem to be working as the TPUClusterResolver throws a Could not lookup TPU metadata error.

Using individual (non-preemptible) TPUs works fine as long as I use the grpc:// address rather than the TPU Name. However, neither individual TPUs nor my TPU Pod work when using the TPU Name, and throw this error.

Can someone help me fix this issue?

Code:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name', zone='europe-west4-a', project='my-project')  # The zone, project and TPU Name are correct

Output:

ValueError: Could not lookup TPU metadata from name 'my-tpu-name'. Please double
check the tpu argument in the TPUClusterResolver constructor.
Exception: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/
instance/service-accounts/default/?recursive=True
from the Google Compute Enginemetadata service. Response: {'metadata-flavor': 'Google', 
'date': 'Thu, 28 May 2020 17:42:35 GMT', 'content-type': 'text/html; charset=UTF-8',
'server': 'Metadata Server for VM', 'content-length': '1629', 'x-xss-protection': '0', 'x
frame-options': 'SAMEORIGIN', 'status': '404'}
  • I would suspect the structure of the tpu name is not supported by the API/ client. Change it to something super simple lower case no other symbols. something like "tpuxyz" – fabrizioM Jun 08 '20 at 17:08

1 Answers1

3

I suspect it could be a mismatch in either one of the following: Tensorflow version, zone or project between compute VM and TPU. If you create both TPU and GCE VM with the same Tensorflow version (2.1 or 2.2) and they both are created in the same project and zone. You can just provide the TPU name in TPUClusterResolver and it should work fine:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name') 

You can omit TPU name if you set TPU_NAME environment variable (export TPU_NAME=my-tpu-name) on your VM.

Gagik
  • 396
  • 3
  • 6