Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.
Questions tagged [google-cloud-tpu]
188 questions
6
votes
3 answers
Check TPU workload/utilization
I am training a model, and when I open the TPU in the Google Cloud Platform console, it shows me the CPU utilization (on the TPU, I suppose). It is really, really, low (like 0.07%), so maybe it is the VM CPU? I am wondering whether the training is…

craft
- 495
- 5
- 16
6
votes
2 answers
TPU Classifier InvalidArgumentError: No OpKernel was registered to support Op 'CrossReplicaSum' with these attrs
I have attempted unsuccessfully to implement an Estimator-based Tensorflow Model using the TPUEstimator API. It hits an error during training:
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op…

eLillie
- 653
- 9
- 17
5
votes
0 answers
Huggingface Bert TPU fine-tuning works on Colab but not in GCP
I'm trying to fine-tune a Huggingface transformers BERT model on TPU. It works in Colab but fails when I switch to a paid TPU on GCP. Jupyter notebook code is as follows:
[1] model =…

user9676571
- 145
- 6
5
votes
2 answers
Google Colab: Why is CPU faster than TPU?
I'm using Google colab TPU to train a simple Keras model. Removing the distributed strategy and running the same program on the CPU is much faster than TPU. How is that possible?
import timeit
import os
import tensorflow as tf
from sklearn.datasets…

Sami Belkacem
- 336
- 3
- 12
5
votes
2 answers
RuntimeError:Mixing different tf.distribute.Strategy objects
Hello!I have encountered some problems when compiling the model using TPU.Some part of codes as follows:
resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TF_MASTER)
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy =…

o sy
- 71
- 5
5
votes
2 answers
How to use trained BERT model checkpoints for prediction?
I trained the BERT with SQUAD 2.0 and got the model.ckpt.data, model.ckpt.meta, model.ckpt.index (F1 score : 81) in the output directory along with predictions.json, etc. using the BERT-master/run_squad.py
python run_squad.py \
…

Jeeva Bharathi
- 514
- 4
- 22
5
votes
1 answer
Keras TPU. Compilation failure: Detected unsupported operations
I try to run my keras UNet model using Google Colab TPU and I faced this problem with UpSampling2D. Any solutions or workaround?
Code to run:
import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from…

Victor Papenko
- 53
- 1
- 5
5
votes
1 answer
TPU slower than GPU?
I just tried using TPU in Google Colab and I want to see how much TPU is faster than GPU. I got surprisingly the opposite result.
The following is the NN.
random_image = tf.random_normal((100, 100, 100, 3))
result =…

fatdragon
- 2,211
- 4
- 26
- 43
5
votes
1 answer
generative models with tensorflow's tpu_estimator()?
Is it possible to train a generative model (i.e. variational autoencoder with custom loss calculation) with tensorflow's tpu_estimator()?
Simplified version of my VAE:
Model Function
def model_fn(features, labels, mode, params):
#Encoder layers …

Justin Greenberg
- 61
- 2
4
votes
3 answers
Can you use a Jupyter notebook on my GCP VM to run TPU training in Google Cloud?
I am switching from running TPUs in colab to running TPUs in Google cloud. I am used to running training in the colab jupyter notebook, but from the GCP TPU quickstart guide, I'll need to use the shell script, and convert my code into a…

SantoshGupta7
- 5,607
- 14
- 58
- 116
4
votes
2 answers
Colab tells me to create a bucket, but where?
When using TPUs on Google Colab (such as in the MNIST example), we are told to create a GCS bucket. However, it doesn't tell us where. Without knowing the region/zone of the Colab instance, I am afraid to create a bucket in fear of running into…

David Nemeskey
- 640
- 1
- 5
- 16
4
votes
1 answer
While running on a TPU instance on Google Colab getting InternalError: Failed to serialize message
I'm trying to train a model on Google Colab using a TPU for a college project. I'm using TensorFlow 1.15.0. Now, as I understand from the TPU examples, I'm converting the tf.keras.models.Model instance to a TPU compatible one with an appropriate…

Bhargav Desai
- 83
- 1
- 5
4
votes
0 answers
Best way to train a CNN on Google Colab TPU
I am trying to train a CNN (ResNet50 for now) using Keras on Google Colab with their TPU support. The TPU VM on Colab has a small local disk size, so I cannot fit my training images on it.
I tried uploading the train/test images to Google drive but…

kg_sYy
- 1,127
- 9
- 26
4
votes
1 answer
Google Colab TPU takes more time than GPU
Below is the code I am using. I commented out the line to convert my model to the TPU model. With GPU for the same amount of data it's taking 7 seconds for an epoch while using TPU it takes 90 secs.
Inp = tf.keras.Input(name='input',…

mihirjoshi
- 12,161
- 7
- 47
- 78
4
votes
2 answers
How to save a Tensorflow Checkpoint file from Google Colaboratory in when using TPU mode?
When I use saver = tf.train.Saver() and save_path = saver.save(session, "checkpointsFolder/checkpoint.ckpt")
I get a UnimplementedError (see above for traceback): File system scheme '[local]' not implemented error
Here is the full…

SantoshGupta7
- 5,607
- 14
- 58
- 116