Google Cloud ML Engine is a managed service that offers training and/or prediction services using Machine Learning models.
Questions tagged [google-cloud-ml]
1007 questions
9
votes
5 answers
How to remotely connect to GCP ML Engine/AWS Sagemaker managed notebooks?
GCP has finally released managed Jupyter notebooks. I would like to be able to interact with the notebook locally by connecting to it. Ie. i use PyCharm to connect to the externaly configured jupyter notebbok server by passing its URL & token…

olive_tree
- 1,417
- 16
- 23
9
votes
2 answers
Upgrade to tf.dataset not working properly when parsing csv
I have a GCMLE experiment and I am trying to upgrade my input_fn to use the new tf.data functionality. I have created the following input_fn based off of this sample
def input_fn(...):
dataset =…

reese0106
- 2,011
- 2
- 16
- 46
9
votes
3 answers
Tensorflow fail with "Unable to get element from the feed as bytes." when attempting to restore checkpoint
I am using Tensorflow r0.12.
I use google-cloud-ml locally to run 2 different training jobs. In the first job, I find good initial values for my variables. I store them in a V2-checkpoint.
When I try to restore my variables for using them in the…

Thibaut Loiseleur
- 814
- 8
- 21
9
votes
1 answer
What to do with failed jobs?
In Google Cloud ML (Machine Learning), I submitted a job, but it failed due to a Python error in the code.
After fixing the error, how can I re-run the job? Should I submit a new job?
When I'm done, how to delete the job?
The online documentation…

Androidification
- 203
- 3
- 11
8
votes
1 answer
Custom code containers for google cloud-ml for inference
I am aware that it is possible to deploy custom containers for training jobs on google cloud and I have been able to get the same running using command.
gcloud ai-platform jobs submit training infer name --region some_region…

Inder
- 3,711
- 9
- 27
- 42
8
votes
1 answer
Encountering preemption OS Error when trying to run distributed GCMLE job
I am trying to run a distributed GCMLE training job and I keep getting the following error:
An error was raised. This may be due to a preemption in a connected worker or parameter server. The current session will be closed and a new session will be…

reese0106
- 2,011
- 2
- 16
- 46
8
votes
2 answers
How do I get a TensorFlow/Keras model that takes images as input to serve predictions on Cloud ML Engine?
There are multiple questions (examples: 1, 2, 3, 4, 5, 6, etc.) trying to address the question of how to handle image data when serving predictions for TensorFlow/Keras models in Cloud ML Engine.
Unfortunately, some of the answers are out-of-date…

rhaertel80
- 8,254
- 1
- 31
- 47
8
votes
4 answers
How to set the request timeout in google ml api python client?
I'm running online predictions on google cloud machine learning API using the google api python client and a model hosted for me at google cloud.
When I predict sending one image, the server, including all traffic, is taking about 40 seconds. When I…

Randolfo
- 647
- 1
- 8
- 18
8
votes
3 answers
How to pass base64 encoded image to Tensorflow prediction?
I have a google-cloud-ml model that I can run prediction by passing a 3 dimensional array of float32...
{ 'instances' [ { 'input' : '[ [ [ 0.0 ], [ 0.5 ], [ 0.8 ] ] ... ] ]' } ] }
However this is not an efficient format to transmit images, so I'd…

user3567174
- 1,898
- 2
- 15
- 18
8
votes
2 answers
MonitoredTrainingSession writes more than one metagraph event per run
When writing checkpoint files using a tf.train.MonitoredTrainingSession it somehow writes multiple metagraphs. What am I doing wrong?
I stripped it down to the following code:
import tensorflow as tf
global_step = tf.Variable(0, dtype=tf.int32,…

Bastiaan
- 4,451
- 4
- 22
- 33
7
votes
2 answers
How to make Google Cloud AI Platform detect `tf.summary.scalar` calls during training?
(Note: I have also asked this question here)
Problem
I have been trying to get Google Cloud's AI platform to display the accuracy of a Keras model, trained on the AI platform. I configured the hyperparameter tuning with hptuning_config.yaml and it…

Julian Ferry
- 305
- 1
- 10
7
votes
2 answers
How to continue training an object detection model using Tensorflow Object Detection API?
I'm using Tensorflow Object Detection API to train an object detection model using transfer learning. Specifically, I'm using ssd_mobilenet_v1_fpn_coco from the model zoo, and using the sample pipeline provided, having of course replaced the…

Simon Labrecque
- 577
- 7
- 15
7
votes
1 answer
Data Normalization with tensorflow tf-transform
I'm doing a neural network prediction with my own datasets using Tensorflow. The first I did was a model that works with a small dataset in my computer. After this, I changed the code a little bit in order to use Google Cloud ML-Engine with bigger…

Marc
- 165
- 3
- 13
7
votes
1 answer
No module named trainer
I have a very simple trainer that follows the sample directory structure:
/dist
__init__.py
setup.py
/trainer
__init__.py
task.py
Under the /dist directory, runs fine locally:
$ gcloud ml-engine local train
…

Neurus
- 657
- 4
- 27
7
votes
1 answer
reading files in google cloud machine learning
I tried to run tensorflow-wavenet on the google cloud ml-engine with gcloud ml-engine jobs submit training but the cloud job crashed when it was trying to read the json configuration file:
with open(args.wavenet_params, 'r') as f:
wavenet_params…

joaeba
- 73
- 1
- 4