Questions tagged [google-cloud-tpu]

Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.

Official website

188 questions
4
votes
1 answer

TensorFlow object detection training error with TPU

I'm following along with Google's object detection on a TPU post and have hit a wall when it comes to training. Looking at the job logs, I can see that ml-engine runs a ton of pip installs for various packages, provisions a TPU, and then submits the…
Gshock
  • 317
  • 1
  • 4
  • 13
3
votes
1 answer

Why is the TPU not recognized on my Google Cloud TPU VM instance?

I have launched a Google Cloud TPU VM instance and installed the latest version of JAX, but it cannot see my TPU. Following the instructions at https://cloud.google.com/tpu/docs/troubleshooting/trouble-jax I encounter the following: >>> import…
3
votes
1 answer

What are requirements for allocating a TPU Pod under VM architechture?

When allocating a TPU under TPU VM architechture, pod versions such as tpu-vm-tf-2.6.2-pod is available as TPU software version. When selecting pod as software version, and following instruction at Run JAX code on TPU Pod Slide jax.device_count()…
Nevus
  • 1,307
  • 1
  • 9
  • 21
3
votes
1 answer

Google Cloud - TPU Node Resource Name

Im trying to create a Google Cloud TPU node using TPU client API and I cannot figure out the parent resource name of a TPU node in Google Cloud. Below you can find the full code I'm using to create the node and I'm struggling to understand wht…
TheDude
  • 51
  • 1
  • 4
3
votes
1 answer

How can I attach a static external IP address with a GCP Cloud TPU VM?

I want to use a static reserved IP address with a Cloud TPU VM. For a regular (non-TPU) Compute Engine instance, I can just use the web interface and go to the "External IP addresses" tab of the "VPC Network" page, select an existing external IP (or…
3
votes
1 answer

Training seq2seq model on Google Colab TPU with big dataset - Keras

I'm trying to train a sequence to sequence model for machine translation using Keras on Google Colab TPU. I have a dataset which I can load in memory but I have to preprocess to it to feed it to the model. In particular I need to convert the target…
3
votes
2 answers

Use TPU v3 in Google Colab Pro

Is there a way to use a TPU v3 instead of a TPU v2 in Google Colab Pro? Unfortunately I get an error message Compilation failure: Ran out of memory in memory space hbm. Used 8.29G of 7.48G hbm. Exceeded hbm capacity by 825.60M. with the TPU v2,…
user14588808
3
votes
1 answer

Error Training Keras Model on Google Colab using TPU runtime

I am trying to create and train my CNN model using TPU in Google Colab. I was planning to use it for classifying dogs and cats. The model works using GPU/CPU runtime but I have trouble running it on TPU runtime. Here's the code for creating my…
3
votes
3 answers

Colab+TPU not supporting TF 2.3.0 tf.keras.layers.experimental.preprocessing

I was updating my model using TF 2.3.0 on Colab+TPU based on https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/, specifically following the Data augmentation and Transfer learning from pre-trained weights…
Daviddd
  • 761
  • 2
  • 12
  • 37
3
votes
1 answer

BERT fine-tuning with Estimators on TPUs on colab TypeError: unsupported operand type(s) for *=: 'NoneType' and 'int'

i wrote a jupyter-notebook on google's colab to fine-tune (for text classification) a version BERT that i already pretrained on Arabic only. And i couldn't get around this error when the training starts. I followed the notebook given by google on…
3
votes
1 answer

Memory reduction Tensorflow TPU v2/v3 bfloat16

My model is too big to get a batch >64 with the normal v2 TPU devices. On the troubleshooting site it is mentioned that upcoming tensorflow versions will have bfloat16 support. Are the newly supported tf versions 1.9-1.12 capable to use bfloat16 now…
3
votes
1 answer

InfeedEnqueueTuple issue when trying to restore updated BERT model checkpoint using Cloud TPU

I'd appreciate any help on the below, thank you in advance. I made a copy of Google Bert's notebook on fine-tuning and trained the SQUAD dataset on it using Cloud TPU and Bucket. The predictions on the dev set are ok, so I downloaded the checkpoint,…
3
votes
1 answer

TPU runs as slow as CPU when using keras_to_tpu_model in colab

I use tf.contrib.tpu.keras_to_tpu_model to make my code be able to run on TPU,but it took 170 hours to finish an epoch while CPU took the same time and GPU took only 40 hours per epoch.I tried to adjust batch size but nothing changed.And I've…
DiIli
  • 137
  • 1
  • 3
  • 10
3
votes
1 answer

How to find out more about the Cloud TPU device you are running your programs against?

Whether we are using Google Colab or accessing Cloud TPUs directly, the below program gives only limited information about the underlying TPUs: import os import tensorflow as tf tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR'] print ('TPU…
Mani Sarkar
  • 115
  • 2
  • 9
3
votes
1 answer

ValueError: Operation u'tpu_140462710602256/VarIsInitializedOp' has been marked as not fetchable

The code works fine on GPU and CPU.But when I use keras_to_tpu_model function to make the model able to run on TPU, the error occurred. This is the full output on…
DiIli
  • 137
  • 1
  • 3
  • 10
1 2
3
12 13