Questions tagged [tpu]

Use this tag for Tensor Processing Unit (TPU). TPU is an application-specific integrated circuit developed by Google.

421 questions
2
votes
0 answers

Tokenize dataset using map on tf.data.Dataset.from_tensor_slices(....)

Note: I am using the free TPU provided on Kaggle. I want to tokenize the text using transformers such that I tokenize only the batch while training the model instead of first tokenizing the whole dataset and then creating batches from the tokenized…
Abhishek Prajapat
  • 1,793
  • 2
  • 8
  • 19
2
votes
2 answers

How to clear Colab Tensorflow TPU memory

I am executing model for several folds. After each fold I want to clear the TPU memory so that I don't get OOM error. Full trace of the current error. ResourceExhaustedError Traceback (most recent call…
Abhishek Prajapat
  • 1,793
  • 2
  • 8
  • 19
2
votes
1 answer

TPU returning "failed call to cuInit: UNKNOWN ERROR (303)" on Google Cloud with Kubernetes Cluster

I am trying to use a TPU with Google Cloud's Kubernetes engine. My code returns several errors when I try to initialize the TPU, and any other operations only run on the CPU. To run this program, I am transferring a Python file from my Dockerhub…
2
votes
0 answers

Using a TPus with SpaCy

Is it possible to use a tpu in spacy... I know that you can use a gpu with spacy.prefer_gpu(). Is there something similar to this for tpu? Thanks in advance!
pineapps
  • 21
  • 3
2
votes
1 answer

UnimplementedError: File system scheme '[local]' not implemented

I am getting an error while implementing TensorFlow in TPU UnimplementedError: File system scheme '[local]' not implemented (file: '1.png') I know this question has been answered before but my issue is different, I am getting this error when I…
Talha Anwar
  • 2,699
  • 4
  • 23
  • 62
2
votes
0 answers

How to gather prediction result on TPU (Pytorch)?

I'm trying to fine-tune my bert-based QA model(PyTorch) with Tpu v3-8 provided by Kaggle. In the validation process I used a ParallelLoader to make predictions on 8 cores at the same time. But after that I don't know what should I do to gather all…
2
votes
1 answer

Tensorflow load saved model, Predict and Evaluate. Too low accuracy on test?

I have trained my model on TPU and result seems good for testing. Dataset has 5 classes and result shows that: accuracy: 0.9867 - sparse_categorical_accuracy: 0.9867 - loss: 0.0412 - val_accuracy: 0.9859 - val_sparse_categorical_accuracy: 0.9859 -…
Nobat
  • 51
  • 5
2
votes
1 answer

Kaggle TPU: failed to connect to all addresses

I'm facing some problems while trying to fit my model using TPU on kaggle. Tpu already's initialized: try: tpu = tf.distribute.cluster_resolver.TPUClusterResolver() print(f'Running on TPU {tpu.master()}') except ValueError: tpu = None if…
rdn
  • 33
  • 1
  • 3
2
votes
0 answers

Training a keras model using TPU pods?

I was wondering if anyone has an example of using a keras model on a TPU pod? I have a model creating method which returns a keras model which is compiled within a TPU strategy scope, as recommended by many examples on using TPUs with keras. This…
st0ne
  • 106
  • 1
  • 9
2
votes
1 answer

Train model on Colab TPU with distributed strategy

I'm trying to train and run an image classification model on Colab, using TPU. No pytorch. I know that TPU works only with files from GCS buckets so I load the dataset from a bucket and I commented also the checkpoint and logging functions, to not…
2
votes
1 answer

Use TPU in Google Colab

I am currently training a neural network with the help of a TPU. I changed the runtime type and initialized the TPU. I have the feeling that it is still not faster. I used https://www.tensorflow.org/guide/tpu. Did I something wrong? # TPU…
user14576365
2
votes
1 answer

TF 2.3 using experimental_steps_per_execution in model.compile cause drop in model performance

Using TPU, I have tried to pass experimental_steps_per_execution to model.compile(...), I do see a big speedup, but for the exact learning rate schedule, I noticed a 2-3% drop in accuracy when training is done. In summary, the only thing I changed…
kawingkelvin
  • 3,649
  • 2
  • 30
  • 50
2
votes
0 answers

How to reduce TPU idle time?

I'm getting like 99.7% TPU idle time with my training code (https://github.com/ksjae/KoGPT2-train). What are the general methods used to reducing IDLE time? How can I(or any user in general) reduce it to a sane amount? How can I find the culprit of…
efe23eds
  • 51
  • 4
2
votes
1 answer

RPC failed with status = "Unavailable: Socket closed" Error when training FairSeq RoBERTa on Cloud TPU using PyTorch

I followed the tutorials "Pre-training FairSeq RoBERTa on Cloud TPU using Pytorch" to setup a Preemptible (v2-8) TPU env and train my RoBERTa model. The PyTorch env is based on torch-xla-1.6 as instructed by the document. However, it does not output…
user3786340
  • 190
  • 1
  • 8
2
votes
0 answers

Google Colab don't get the file from GCS Bucket

I am trying to train a model from this repo with TPU which requires all input files and the model directory must use a cloud storage bucket. I did create a bucket and upload all the files of the model. But google colab cannot read the path of my…
huy
  • 1,648
  • 3
  • 14
  • 40