Also can someone list out detailed steps to train and deploy a tensorflow model on Gcloud? I have my own code that I would not like to change. It seems like the code has to be in some sort of aa rigid format for it to be used on Gcloud, for example the task.py file, etc.
-
Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [on topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. StackOverflow is not a design, coding, research, or tutorial service. – Prune Jul 21 '17 at 17:48
2 Answers
Let me see if I can help you -- this might require followup questions (which are welcome) beyond this high-level answer.
First the docs - hopefully you've seen https://cloud.google.com/ml-engine/docs/how-tos/training-steps which links to various topics that are relevant here.
Let me try to summarize some of the key things you want to keep in mind.
At a very high-level you need to author a python program that accepts a set of command-line args, so the interface is fairly general. You do not need to name things task.py.
You do need to package up your python code as well as declare dependencies, so they can be installed when your job runs on cloud. (see https://cloud.google.com/ml-engine/docs/how-tos/packaging-trainer)
In the case of distributed training, you'll want to use the TF_CONFIG environment variable to instantiate a TensorFlow server that can coordinate with other workers in your job. See this https://cloud.google.com/ml-engine/docs/concepts/trainer-considerations
When you submit your job using the gcloud tool, you'll want to specify a cluster configuration.
In cloud, you'll want to read training data and write checkpoints, summaries and the resulting model from/to cloud storage, rather than local disk (which is transient). TensorFlow supports GCS in its file I/O APIs. See https://cloud.google.com/ml-engine/docs/how-tos/working-with-data
Finally when you produce a model to use for deployment/prediction with ML Engine (if you need to), then make sure you use SavedModel APIs - https://cloud.google.com/ml-engine/docs/how-tos/deploying-models
Hopefully this helps give you a broad overview.
Another thing that helps is understanding the code you do have - is it based on low-level TensorFlow APIs? Or is it based on Estimators? The latter simplifies many aspects (esp. distributed training).

- 5,215
- 2
- 22
- 28
-
I have read through the documentation. I am not able to parse it very well, I feel. From what I understand: 1) You can have gcloud package your code well by calling the "gcloud ml-engine jobs submit training --package-path=...." command. The problem I am facing is this: Folder structure of my project is as follows: Parent folder: my_project Subfolders: my_project/code, my_project/data, my_project/models Obviously my package path is my_project/code. However, when I try to run the main code inside that folder that creates a new model, gcloud says code/code_to_be_run.py does not exist – Jul 23 '17 at 18:22
-
Did you create a setup.py in your my_project directory? And then point gcloud at your my_project directory. – Nikhil Kothari Jul 23 '17 at 22:05
-
Yes. There is a setup.py file. It still says, "No module named code/code_to_be_run.py" – Jul 24 '17 at 17:13
-
The module to run would be code.code_to_be_run. Secondly does the code directory have an __init__.py and does setup.py include that in the list of packages? For example: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/mnist/trainable – Nikhil Kothari Jul 25 '17 at 00:15
I just went through this process myself for the first time 2 weeks ago. What I'd recommend is using this tutorial (created by the kind folks at Google).
I don't remember running into any big issues, but let me know if you hit any road blocks and I might be able to help you out.
To change the prediction input from json to csv in the example from the above linked tutorial, you'll notice that the default given is 'JSON', but this can be changed to 'CSV' (source):
parser.add_argument(
'--export-format',
help='The input format of the exported SavedModel binary',
choices=['JSON', 'CSV', 'EXAMPLE'],
default='JSON'
)
This means you can specify --export-format 'CSV'
when you create the model. For example:
python trainer/task.py \
--train-files ~/Documents/data/adult.data.csv \
--eval-files ~/Documents/data/adult.test.csv \
--job-dir ~/Documents/models/census/v1 \
--train-steps 100 \
--verbosity 'DEBUG' \
--export-format 'CSV'

- 139
- 1
- 8