Running TensorFlow trainer with Cloud ML Engine on TPU produces google.rpc.QuotaFailure

Question

I have developed a TensorFlow model on Cloud ML Engine with scaleTier: BASIC.

Running its trainer experimentally on a GPU with scaleTier: BASIC_GPU works fine. But an attempt of running it on a TPU with scaleTier: BASIC_TPU produces this error message:

type.googleapis.com/google.rpc.QuotaFailure
The request for 1 TPU_V2 accelerators exceeds the allowed maximum
of 30 K80, 30 P100.

Where does this limitation come from and can it be lifted e.g. by enabling another API or increasing my initial budget?

score 2 · Accepted Answer · answered Aug 25 '18 at 00:16

As announced at Google Cloud Next '18, Cloud TPUs are now available to everyone, without whitelisting.

To enable them for Cloud ML Engine, go here:

https://cloud.google.com/ml-engine/docs/tensorflow/using-tpus

...scroll down to the heading "Authorize your Cloud TPU to access your project", and follow the instructions there. In short, you need to provide IAM access of your resources to the TPU that you have created.

score 1 · Answer 2 · answered Dec 25 '17 at 18:48

1

I tried the same thing and got the same result. The documentation implies that TPUs are available to everyone, but that's not the case. To the best of my knowledge, you have to specially request TPU access (I filled out the request but didn't get a response).

answered Dec 25 '17 at 18:48

MatthewScarpino

5,672
5
33
47

Yep, that's also my preliminary conclusion. – Drux Dec 25 '17 at 18:50
2

1. Cloud ML Engine TPU is in alpha, and it's whitelist only. If you want to give a try, please contact us via cloudml-feedback@google.com. 2. Cloud TPU is a different product, which has it's own TPU alpha. Both products are offering TPU. Cloud ML Engine provides managed service while Cloud TPU provides raw VM with TPU. – Guoqing Xu Dec 25 '17 at 23:50

Running TensorFlow trainer with Cloud ML Engine on TPU produces google.rpc.QuotaFailure

2 Answers2