Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.
Questions tagged [google-cloud-tpu]
188 questions
1
vote
3 answers
TPU custom chip available with Google Cloud ML
Which type of Hardware is used as part of Google Cloud ML when using TensorFlow? Only CPU or Tensor Processing Unit (custom cards) are also available?
cf this article

Nick
- 23
- 2
0
votes
1 answer
Cloud TPU - Migrating off of older TF versions
If I am a Cloud TPU user and I have several TPU nodes with older TF versions (<=2.6.x) which are soon to be deprecated, is it possible to get support from the Cloud TPU team with migration? Please assign this issue the highest priority possible as…
0
votes
2 answers
Error while trying to use GCP VM Instance with TPU VM
I created a VM instance in GCP with Pytorch XLA environment.
And I created a TPU-VM with tpu-vm-pt-2.0.
I SSHed into the VM instance and activated the conda environment with pytorch-xla. But, when I try to test a sample script to test for TPU,…

mr oogway
- 1
- 3
0
votes
1 answer
Access to the v4 TPUs
For our purposes, we would really like to have access to the v4 TPUs. We found the Google form and filled it out a few weeks ago, but it seems we've thrown a dart into an abyss, with no response. Is there any way to accelerate/another method to get…
0
votes
1 answer
Trouble connecting to GCP TPU VM
I followed along with the instructions to create a cloud TPU VM and run a custom neural network as directed by the Run Tensorflow on TPU pod slices to a T. It's important to note that I have been able to initialize the cloud TPUs when running this…

Brad Messer
- 1
- 1
0
votes
1 answer
Running Pytorch on Cloud TPU VM on GCP gives INVALID_ARGUMENT: No matching devices found for '/job:localservice/replica:0/task:0/device:TPU_SYSTEM:0'
I created a TPU VM on GCP.
I am following the documentation page on how to run a calculation on a Cloud TPU VM by using PyTorch
I have set the XRT TPU device configuration in the VM with
export XRT_TPU_CONFIG="localservice;0;localhost:51011"
I…

BioGeek
- 21,897
- 23
- 83
- 145
0
votes
1 answer
TPU not found on Google VM (jax version 0.2.16)
I'm running a TPU v3-8 VM on Google. On the VM, I installed jax with pip install "jax[tpu]==0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html.
Unfortunately, I'm getting the message No GPU/TPU found, falling back to CPU,…

BlackHawk
- 719
- 1
- 6
- 18
0
votes
1 answer
How to understand the padding rules on cloud TPU?
Cloud TPU has two padding rules on batch_size and feature_size of convolution operations, to minimize memory overhead and maximize computational efficiency (from here).
The total batch size should be a multiple of 64 (8 per TPU core), and feature…

ILS
- 1,224
- 1
- 8
- 14
0
votes
1 answer
Connection refused when switching TPU version
How can I switch TPU version for TPU VM architechture?
When attempting to switch software version for TPU(TPU VM architechture switching from tpu-vm-tf-2.6.0-pod to tpu-vm-base) using instructions found here, I get Connection Refused exception with…

Nevus
- 1,307
- 1
- 9
- 21
0
votes
1 answer
How to fix "INVALID_ARGUMENT: Cloud TPU received an invalid argument. The "GuestAttributes" value "" was not found."?
I recently started using TPUv3-8 VMs to train language models and haven't had any issues with VMs crashing or the like. However, one of my TPU VMs seems to now have broken out of nowhere and I am completely lost.
When trying to ssh to the VM, I get…
0
votes
0 answers
TPU VM access Cloud Storage 403 forbidden when writing files
When I run my python command to train my model on my tpu-vm, it failed on writing files to Cloud Storage.
Traceback (most recent call last):
File "device_train.py", line 302, in
save(network, step, bucket, model_dir,
File…

csliu_jia
- 1
- 1
0
votes
0 answers
TPU VM ssh connect unstable or disconnect after some seconds
When I use the command proxychains gcloud alpha compute tpus tpu-vm ssh xx --zone zone to connect TPU VM, the connection only lasts 5 to 10 seconds.
This is very bad because I don't have time to get it to execute my command.
I have checked the…

csliu_jia
- 1
- 1
0
votes
1 answer
Write on GCP bucket from TPU vm
I am training a bert model using a TPU vm on GCP.
I want to use my bucket as the Datasets library Cache filepath. I have followed instructions from
https://cloud.google.com/tpu/docs/tutorials/bert-2.x and set my bucket link in the HF_DATASETS_CACHE…

kamel gaanoun
- 13
- 3
0
votes
1 answer
TPU training fails with certain metric, succeeds on CPU
I'm trying to train a simple EfficientNet style model on some images. Training works fine on a CPU, but when I switch across to using a TPU I get the following error:
(0) Invalid argument: {{function_node
__inference_train_function_38255}} Output…

dgmp88
- 537
- 6
- 13
0
votes
2 answers
How can I use a Cloud TPU with Tensorflow Lite Model Maker?
I'm training an object detection model (EfficientDet-Lite) using Tensorflow Lite Model Maker in Colab and I'd like to use a Cloud TPU. I have all the images in a GCS bucket and provide a CSV file. When I call object_detector.create I get the…

TvE
- 1,016
- 1
- 11
- 19