Questions tagged [google-cloud-tpu]

Google Cloud TPUs (Tensor Processing Units) accelerate machine learning workloads developed using TensorFlow. This tag is used for questions about using the Google Cloud TPU service. Topics can range from the service user experience, issues with the trainer program written with Tensorflow, project quota issues, security, authentication, etc.

Official website

188 questions
0
votes
1 answer

How to train BERT model with SQUAD 2.0 in Cloud TPU v2?

Disclaimer: I am very new to Neural Network and Tensorflow. I am trying to create a QA application where user asks a question and the application gives the answer. Most of the traditional methods I tried did not work or is not accurate enough or…
0
votes
1 answer

Is there any work around to be able to use multiple "correct" metrics for keras learning on TPU?

I made small model using Keras on Google colaboratory. And I see wrong metrics when I run learning on TPU. When I run learning on CPU/GPU, of course, m1 and m2 metrics shows correct number. (see below code) But after I change runtime type to TPU, m1…
0
votes
1 answer

shape cannot be sharded 8 ways along dimension 0

Im trying to run a custom estimator on Compute engine and google cloud TPU however I get an error: ValueError: shape [2] cannot be sharded 8 ways along dimension 0 I have no idea what causes or how to fix it - any idea? Andy P.
andrew Patterson
  • 559
  • 2
  • 6
  • 19
0
votes
1 answer

Connection Timeout for connecting to google cloud TPU using notebook

I am trying to train a model using BERT. For BERT parameters I hope to be able to fine tune the parameters using my own dataset. I am using Google Cloud platform and TPU to accelerate the training process. I am following this tutorial just replaced…
0
votes
1 answer

Dimension error when training on TPU with sparse_categorical_accuracy with label_encoded data

I am trying to use Google Colab's free TPU for my training and I have created a dataset with tf.data. my y_label is label encoded data with 7 labels. and I get this error InvalidArgumentError: Can not squeeze dim[1], expected a dimension of >1, got…
0
votes
1 answer

Use already existing VM with TPU

According to ctpu documentation I can use following commands: status, up, pause and delete, where up does the following " ctpu up will create a Compute Engine VM with TensorFlow pre-installed". However, I already have a VM on GCP I am working with.…
0
votes
0 answers

Problem running python script using TPU on VM instance

I created TPU and VM instance with same name via cloud console(not ctpu, gcloud). When I check tpu on VM with command gcloud compute tpus list I get my TPU READY. But when I run python script: from tensorflow.contrib.cluster_resolver import…
0
votes
2 answers

Using the same TPU model for training and inference (prediction) in Google Colab

I have a code something like this: def getModel(): model = Sequential() model.Add(...) ..... model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=tf.contrib.tpu.TPUDistributionStrategy( …
Gokul NC
  • 1,111
  • 4
  • 17
  • 39
0
votes
1 answer

Google colab TPU bugged or I do something wrong?

Trying to port my dcgan to tpu. But this error: Error recorded from training_loop: File system scheme '[local]' not implemented Here is notebook https://colab.research.google.com/drive/101FjBAIMVuXyNyeUvq_Vfx-Z6CR3g4df
had
  • 327
  • 2
  • 12
0
votes
2 answers

training MNIST with TPU generates errors

Following the Running MNIST on Cloud TPU tutorial: I get the following error when I try to train: python /usr/share/models/official/mnist/mnist_tpu.py \ --tpu=$TPU_NAME \ --DATA_DIR=${STORAGE_BUCKET}/data \ --MODEL_DIR=${STORAGE_BUCKET}/output…
Alex Ryan
  • 3,719
  • 5
  • 25
  • 41
0
votes
1 answer

Error processing Timeseries tensorflow notebook on TPU

Here's a Timeseries notebook I used from the good work by Magnus Erik Hvass Pedersen - thanks for that: https://colab.research.google.com/drive/1F6CuGVWN5TNgIjqxdu5glFeGBEr71TgO I have had success running a version of this notebook via Google Colab…
0
votes
1 answer

TPU utilization low due to output fusion

I am training a U-Net on a Google Cloud TPU. It works but the utilization is very low. Due to the fact I can not upload the traced profile here (?), a screenshot of the most slowing part is here: The output fusion is the most harming part. with 58%…
0
votes
1 answer

Getting weird results in TensorBoard in the Profile Tab

I am kind of getting really fishy results for my TensorBoard Profile calculations. It seems that my Host idle time (not sure what host this is reffering to?) is really high which is super bad, but my TPU idle time is 0% which is super good. Also…
craft
  • 495
  • 5
  • 16
0
votes
1 answer

rewrting feed_dict tf.session and tf.graph to estimator

I have some code which was written with the feed_dict to tf.Session and tf.Graph low level API and since I want to use it on a TPU I am trying to rewrite it into tf.Estimator API Below is the current version of the code. (some fragments are removed…
0
votes
2 answers

GCE How to use ImageDataGenerator in google TPU

I have a Keras model, but it's too big for my local PC and I'm trying to migrate to Google cloud to be able to use TPU. The examples that I have seen uses in memory images to train the model with fit function. I have thousands of images and also I…
Mquinteiro
  • 1,034
  • 1
  • 11
  • 31
1 2 3
12
13