0

I would like to use Tensorflow 1.3 (and maybe 1.4) on Cloud ML. Im running jobs on multi-GPU machines on Cloud ML

I do that by specifying the tensorflow version in the setup.py as shown below:

from setuptools import setup

REQUIRED_PACKAGES = ['tensorflow==1.3.0']

setup(
    name='my-image-classification',
    install_requires=REQUIRED_PACKAGES,
    version='1.0',
    packages=['my_image_classification',
              'my_image_classification/foo',
              'my_image_classification/bar',
              'my_image_classification/utils'],
    )

What is the cudnn library that is installed on Cloud ML? Is it compatible with tensorflow 1.3 and tensorflow 1.3+ ?

I was able to start the jobs, but the performance is 10X lower than the expected value, and I'm curious if there is a problem with the underlying linking of Libraries

Edit:

I'm pretty confident now that the Cudnn versions on Cloud ML dont match what is required for Tensorflow 1.3. I noticed that Tensorflow 1.3 jobs are missing the "Creating Tensorflow device (/gpu:0...) " Logs which appear when I run a job with the default available Tensorflow on cloud ml

7hacker
  • 1,928
  • 3
  • 19
  • 32
  • Where you actually able to deploy a model in Cloud ML with TensorFlow r1.3? I tried that a couple of weeks ago and it wasn't possible. In fact, the supported runtime versions are r1.0 and r1.2, listed here: cloud.google.com/ml-engine/docs/runtime-version-list However, I have been able to train with r1.3 with no problems, as explained here: https://cloud.google.com/ml-engine/docs/versioning#specifying_custom_versions_of_tensorflow_for_training. I guess that that the nodes deployed do a standard installation of whichever REQUIRED_PACKAGES are defined in the setup.py but not for prediction. – Guille Nov 03 '17 at 16:15
  • Yes. It is still not supported as a runtime, but this is a workaround. Use the --runtime-version=HEAD command line argument to your gcloud submit job .. command. You should also specify the REQUIRED_PACKAGES in your setup.py. See comments below the accepted answer. – 7hacker Nov 03 '17 at 18:40
  • Yeah, that made it :) Thanks! Now we can even train and deploy a model with TF 1.4 but we can't do batch predictions on it, although we can do online predictions surprisingly... – Guille Nov 30 '17 at 09:46

1 Answers1

1

DISCLAIMER: using anything but 1.0, 1.2 is not officially supported as of 2017/11/01.

You need to specify the GPU-enabled version of TensorFlow:

REQUIRED_PACKAGES = ['tensorflow-gpu==1.3.0']

But the version of pip is out-of-date so you need to force that to update first.

rhaertel80
  • 8,254
  • 1
  • 31
  • 47
  • This breaks my imports with the error: import tensorflow as tf ERROR 2017-11-02 12:41:12 -0700 service ImportError: No module named tensorflow – 7hacker Nov 02 '17 at 19:46
  • ah, the `pip` version is too old on the service. Try using --runtime-version=HEAD when submitting your job. Same disclaimer applies. – rhaertel80 Nov 02 '17 at 23:29