Questions tagged [google-cloud-tpu]

Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.

Official website

188 questions
3
votes
2 answers

Mask R-CNN for TPU on Google Colab

We are trying to build an image segmentation deep learning model using Google Colab TPU. Our model is Mask R-CNN. TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR'] import tensorflow as tf tpu_model =…
3
votes
2 answers

Panic error on google cloud TPU

I can open a ctpu session and get the code I need from my git repository, but when I run my tensorflow code from the cloud shell, I get a message to say that there is no TPU and my program crashes. Here is the error message I…
3
votes
0 answers

GCE VM can't connect to TPU

I've been following the instruction at https://cloud.google.com/tpu/docs/custom-setup and now I'm trying to run a tiny example from https://cloud.google.com/tpu/docs/quickstart But it hangs on sess.run(tpu.initialize_system()) I suspect that it…
2
votes
2 answers

Tensorboard profiling a predict call using Cloud TPU Node

I've been trying to profile a predict call of a custom NN model using a Cloud TPU v2-8 Node. It is important to say that my prediction call takes about 2 minutes to finish and I do it using data divided in TFRecord batches. I followed the official…
2
votes
1 answer

TPU returning "failed call to cuInit: UNKNOWN ERROR (303)" on Google Cloud with Kubernetes Cluster

I am trying to use a TPU with Google Cloud's Kubernetes engine. My code returns several errors when I try to initialize the TPU, and any other operations only run on the CPU. To run this program, I am transferring a Python file from my Dockerhub…
2
votes
1 answer

Google Colab TPU Version

How do I print in Google Colab which TPU version I am using and how much memory the TPUs have? With I get the following Output tpu =…
user14588808
2
votes
0 answers

Can't create Google Cloud TPU: an internal error has occurred code 13

I can't create a Google Cloud TPU: gcloud compute tpus create bert-tpu --version=1.15 --preemptible --zone=europe-west4-a Create request issued for: [bert-tpu] Waiting for operation [projects//locations/europe-west4-a/operations/] to…
webb
  • 4,180
  • 1
  • 17
  • 26
2
votes
1 answer

Error with TPUClusterResolver for Cloud TPU v3 Pod with TensorFlow 2.1

I'm trying to use my (pre-emptible) Cloud TPU v3-256 on my Google Cloud Compute Engine VM with TensorFlow 2.1, but it doesn't seem to be working as the TPUClusterResolver throws a Could not lookup TPU metadata error. Using individual…
2
votes
2 answers

GCP and TPU, experimental_connect_to_cluster give no response

I am trying to use TPU on GCP with tensorflow 2.1 with Keras API. Unfortunately, I am stuck after creating the tpu-node. In fact, it seems that my VM "see" the tpu, but could not connect to it. The code I am using : resolver =…
Shiro
  • 795
  • 1
  • 7
  • 23
2
votes
1 answer

No OpKernel was registered to support Op 'TPUReplicateMetadata' used by node TPUReplicateMetadata

When I run the following .ipynb: https://colab.research.google.com/drive/1DpUCBm58fruGNRtQL_DiSVbT90spdZgm I got: No OpKernel was registered to support Op 'TPUReplicateMetadata' used by node TPUReplicateMetadata (defined at…
user7862197
2
votes
1 answer

Colab TPU: TensorFlow '2.0.0-beta0' LinearClassifier .train Bug

Attempting to get LinearClassifier running with Colab TPU. https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/estimator/LinearClassifier TPUStrategy is supported in TensorFlow 2.0…
Machine Learning
  • 485
  • 6
  • 15
2
votes
1 answer

How does BERT utilize TPU memories?

README in the Google's BERT repo says, even a single sentence of length 512 can not sit in a 12 GB Titan X for the BERT-Large model. But in the BERT paper, it says 64 TPU chips are used to train BERT-Large with a maximum length 512 and batch size…
2
votes
1 answer

Converting code from keras to tf.keras causes problems

I am learning machine translation in Keras using the code from this article. The article's code works fine on GPU and CPU as-is. Now I want to take advantage of Google Colab TPUs. The code doesn't TPU-ify as-is, I need to move in a TF direction. …
Lars Ericson
  • 1,952
  • 4
  • 32
  • 45
2
votes
2 answers

Simple model can't run on tpu (on colab)

I have problems running a very simple model using TPU on google colab. I have distilled it to a very simple program. I suspect it doesn't like the nested models (input_2?) but I have no idea how to solve this: import numpy as np import os import…
Moshel
  • 400
  • 3
  • 13
2
votes
2 answers

How to save Keras model trained on TPU?

I'm using Colab environment to make experiments with lstm model. But cannot save trained model. sess = tf.keras.backend.get_session() training_model = lstm_model(seq_len=100, batch_size=128, stateful=False) tpu_model =…
psu
  • 111
  • 1
  • 10