5

I am trying to run Google clound ml-engine following this tutorial, when executing this command :

$ gcloud ml-engine jobs submit training `whoami`_object_detection_`date +%s` \ 
    --job-dir=gs://${YOUR_GCS_BUCKET}/train\
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz\
    --module-name object_detection.model_tpu_main \
    --runtime-version 1.10\
    --scale-tier BASIC_TPU\
    --region us-central1\
    --
    --model_dir=gs://${YOUR_GCS_BUCKET}/train
    --tpu_zone us-central1
    --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/pipeline.config 

commands which answers me :

  ERROR: (gcloud.ml-engine.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '1.10' with the Python version '' is not supported for TPU training.  Please specify a different runtime version. See https://cloud.google.com/ml/docs/concepts/runtime-version-list for a list of supported versions
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: The specified runtime version '1.10' with the Python version '' is
  not supported for TPU training.  Please specify a different runtime version.
  See https://cloud.google.com/ml/docs/concepts/runtime-version-list for a list
  of supported versions
field: runtime_version

Because no version of python is detected, I added a --config=config.yaml argument to the command line:

config.yaml:

trainingInput:
  pythonVersion: "3.5"

but the problem barely changed :

...
 - description: The specified runtime version '1.10' with the Python version '3.5'
...

the runtime version list specify that runtime-version 1.10 is compatible with python 3.5, I also tried different runtime environement / python version which are required to work but my command keeps failling.

Jean Bouvattier
  • 303
  • 3
  • 19
  • 1
    Seems you are trying to use Cloud TPU. If you look at [Support for Cloud TPU (Beta)](https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list#tpu-support), it says that version [1.9](https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list#1.9) is supported. – jdehesa Nov 22 '18 at 11:25

3 Answers3

4

@jdehesa is right, the supported version is 1.9. The version 1.10 is not supported for training Cloud TPU models. change the runtime version editing this flag:

--runtime-version 1.9
Alex Riquelme
  • 1,475
  • 7
  • 14
  • I have the same problem with 1.8 but 1.9 seems to work fine, thanks! – Jean Bouvattier Nov 22 '18 at 13:11
  • 2
    You can check the supported CMLE runtime versions for TPU in: https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list#tpu-support. Currently, the only supported version is 1.9. And the engineering team is working on adding TPU support for Tensorflow 1.11 and 1.12. – lwz1992 Nov 26 '18 at 21:17
2

For information the current supported versions are now 1.11 or 1.12 runtime-version-list

GChevass
  • 205
  • 3
  • 9
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Karl May 29 '19 at 09:43
0

I had this same issue even after making sure all the versions were compatible.

Once I added this line to the gcloud training command (below the runtime version line) it worked fine.

--python-version 3.7 \
Joe Below
  • 47
  • 6