Questions tagged [google-cloud-tpu]

Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.

Official website

188 questions
6
votes
3 answers

Check TPU workload/utilization

I am training a model, and when I open the TPU in the Google Cloud Platform console, it shows me the CPU utilization (on the TPU, I suppose). It is really, really, low (like 0.07%), so maybe it is the VM CPU? I am wondering whether the training is…
6
votes
2 answers

TPU Classifier InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' with these attrs

I have attempted unsuccessfully to implement an Estimator-based Tensorflow Model using the TPUEstimator API. It hits an error during training: InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op…
5
votes
0 answers

Huggingface Bert TPU fine-tuning works on Colab but not in GCP

I'm trying to fine-tune a Huggingface transformers BERT model on TPU. It works in Colab but fails when I switch to a paid TPU on GCP. Jupyter notebook code is as follows: [1] model =…
5
votes
2 answers

Google Colab: Why is CPU faster than TPU?

I'm using Google colab TPU to train a simple Keras model. Removing the distributed strategy and running the same program on the CPU is much faster than TPU. How is that possible? import timeit import os import tensorflow as tf from sklearn.datasets…
5
votes
2 answers

RuntimeError:Mixing different tf.distribute.Strategy objects

Hello!I have encountered some problems when compiling the model using TPU.Some part of codes as follows: resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TF_MASTER) tf.contrib.distribute.initialize_tpu_system(resolver) strategy =…
o sy
  • 71
  • 5
5
votes
2 answers

How to use trained BERT model checkpoints for prediction?

I trained the BERT with SQUAD 2.0 and got the model.ckpt.data, model.ckpt.meta, model.ckpt.index (F1 score : 81) in the output directory along with predictions.json, etc. using the BERT-master/run_squad.py python run_squad.py \ …
5
votes
1 answer

Keras TPU. Compilation failure: Detected unsupported operations

I try to run my keras UNet model using Google Colab TPU and I faced this problem with UpSampling2D. Any solutions or workaround? Code to run: import os import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from…
5
votes
1 answer

TPU slower than GPU?

I just tried using TPU in Google Colab and I want to see how much TPU is faster than GPU. I got surprisingly the opposite result. The following is the NN. random_image = tf.random_normal((100, 100, 100, 3)) result =…
5
votes
1 answer

generative models with tensorflow's tpu_estimator()?

Is it possible to train a generative model (i.e. variational autoencoder with custom loss calculation) with tensorflow's tpu_estimator()? Simplified version of my VAE: Model Function def model_fn(features, labels, mode, params): #Encoder layers …
4
votes
3 answers

Can you use a Jupyter notebook on my GCP VM to run TPU training in Google Cloud?

I am switching from running TPUs in colab to running TPUs in Google cloud. I am used to running training in the colab jupyter notebook, but from the GCP TPU quickstart guide, I'll need to use the shell script, and convert my code into a…
4
votes
2 answers

Colab tells me to create a bucket, but where?

When using TPUs on Google Colab (such as in the MNIST example), we are told to create a GCS bucket. However, it doesn't tell us where. Without knowing the region/zone of the Colab instance, I am afraid to create a bucket in fear of running into…
David Nemeskey
  • 640
  • 1
  • 5
  • 16
4
votes
1 answer

While running on a TPU instance on Google Colab getting InternalError: Failed to serialize message

I'm trying to train a model on Google Colab using a TPU for a college project. I'm using TensorFlow 1.15.0. Now, as I understand from the TPU examples, I'm converting the tf.keras.models.Model instance to a TPU compatible one with an appropriate…
4
votes
0 answers

Best way to train a CNN on Google Colab TPU

I am trying to train a CNN (ResNet50 for now) using Keras on Google Colab with their TPU support. The TPU VM on Colab has a small local disk size, so I cannot fit my training images on it. I tried uploading the train/test images to Google drive but…
kg_sYy
  • 1,127
  • 9
  • 26
4
votes
1 answer

Google Colab TPU takes more time than GPU

Below is the code I am using. I commented out the line to convert my model to the TPU model. With GPU for the same amount of data it's taking 7 seconds for an epoch while using TPU it takes 90 secs. Inp = tf.keras.Input(name='input',…
mihirjoshi
  • 12,161
  • 7
  • 47
  • 78
4
votes
2 answers

How to save a Tensorflow Checkpoint file from Google Colaboratory in when using TPU mode?

When I use saver = tf.train.Saver() and save_path = saver.save(session, "checkpointsFolder/checkpoint.ckpt") I get a UnimplementedError (see above for traceback): File system scheme '[local]' not implemented error Here is the full…
SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116
1
2
3
12 13