0

I am trying to run a large distributed tensorflow model on Google Cloud's ML engine and am having trouble understanding what should go on tf.train.ClusterSpec.

When you run a job on Google Cloud you can select the scale tier from BASIC, STANDARD_1, PREMIUM_1, BASIC_GPU or CUSTOM, each giving you access to different types of clusters. However, I can't find the name/addresses of the machines in these clusters.

Miguel Monteiro
  • 389
  • 1
  • 2
  • 16

1 Answers1

0

Please take a look at the documentation and sample here. You should set ClusterSpec using the environment variable TF_CONFIG; e.g.

  tf_config = os.environ.get('TF_CONFIG')

  # If TF_CONFIG is not available run local
  if not tf_config:
    return run('', True, *args, **kwargs)

  tf_config_json = json.loads(tf_config)
  cluster = tf_config_json.get('cluster')
  ...
  cluster_spec = tf.train.ClusterSpec(cluster)
Jeremy Lewi
  • 6,386
  • 6
  • 22
  • 37