Google Cloud ML Engine is a managed service that offers training and/or prediction services using Machine Learning models.
Questions tagged [google-cloud-ml]
1007 questions
3
votes
1 answer
Tensorflow — Cannot call `tf.keras.Model.add_metric` when `tf.distribute.MirroredStrategy` is used
I have a model class that inherits from tf.keras.Model. I can train, evaluate, and export it using 8 GPUs, distributing it with tf.distribute.MirroredStrategy. However, I need custom metrics, and when I call the add_metric method, it throws an error…

Andy Carlson
- 3,633
- 24
- 43
3
votes
3 answers
TensorFlow model serving on Google AI Platform online prediction too slow with instance batches
I'm trying to deploy a TensorFlow model to Google AI Platform for Online Prediction. I'm having latency and throughput issues.
The model runs on my machine in less than 1 second (with only an Intel Core I7 4790K CPU) for a single image. I deployed…

Nahuel Dallacamina
- 41
- 4
3
votes
0 answers
How to speed up AI platform training job queues?
Whenever I submit a training job to the AI platform, I have to wait around 5-10 minutes for my training job to start after it is queued. This happens when I submit a package for training as well as when I submit a docker image.
The logs go something…

Julian Ferry
- 305
- 1
- 10
3
votes
1 answer
ai-platform: No eval folder or export folder in outputs when running TensorFlow 2.1 training job using Estimators
The Problem
My code works locally, but I am not able to get any evaluation data or exports from my TensorFlow estimator when submitting online training jobs after having upgraded to TensorFlow 2.1. Here's the bulk of my code:
def…

sleepyowl
- 168
- 5
3
votes
1 answer
Accessing Google Secret Manager from AI Platform training job with custom container
I am trying to access a secret stored in Google Secret Manager from an AI Platform Training job that runs in a custom container. I am using the following Python code to retrieve secrets:
# Standard library imports
import os
# Import the Secret…
3
votes
1 answer
PyTorch model deployment in AI Platform
I'm deploying a Pytorch model in Google Cloud AI Platform, I'm getting the following error:
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: Model requires more memory than allowed. Please try to…

gogasca
- 9,283
- 6
- 80
- 125
3
votes
3 answers
Cannot deploy trained model to Google Cloud Ai-Platform with custom prediction routine: Model requires more memory than allowed
I am trying to deploy a pretrained pytorch model to AI Platform with a custom prediction routine. After following the instructions described here the deployment fails with the following error:
ERROR: (gcloud.beta.ai-platform.versions.create) Create…

Dino Perić
- 33
- 6
3
votes
2 answers
MultiWorkerMirroredStrategy() not working on Google AI-Platform (CMLE)
I'm getting the following error while using MultiWorkerMirroredStrategy() for training Custom Estimator on Google AI-Platform (CMLE).
ValueError: Unrecognized task_type: 'master', valid task types are: "chief", "worker", "evaluator" and "ps".
Both…

Swapnil Masurekar
- 458
- 1
- 9
- 21
3
votes
2 answers
How to write serving input function for Tensorflow model trained without using Estimators?
I have a model trained on a single machine without using Estimator and I'm looking to serve the final trained model on Google cloud AI platform (ML engine). I exported the frozen graph as a SavedModel using SavedModelBuilder and deployed it on the…

amityadav
- 194
- 8
3
votes
2 answers
Requirements for launching Google Cloud AI Platform Notebooks with custom docker image
On AI Platform Notebooks, the UI lets you select a custom image to launch. If you do so, you're greeted with an info box saying that the container "must follow certain technical requirements":
I assume this means they have a required entrypoint,…

johnpaton
- 715
- 5
- 12
3
votes
2 answers
Unknown Error Sending Data to Google Cloud ML Custom Prediction Routine
I am trying to write a custom ML prediction routine on AI Platform to get text data from a client, do some custom preprocessing, pass it into the model, and run the model. I was able to package and deploy this code on Google cloud successfully.…

hockeybro
- 981
- 1
- 13
- 41
3
votes
0 answers
Creating json instance for AI Platform from image for custom neural network
I recently created a custom neural network with the following code for basic architecture:
def gen_base_model(n_class):
cnn_model = InceptionResNetV2(include_top=False, input_shape=(width, width, 3), weights='imagenet')
inputs =…

Aniruddh Chandratre
- 464
- 2
- 10
3
votes
1 answer
Gcloud ai-platform, can't create model with own prediction-class
I try following AI Platform tutorial to upload a model and a prediction routine but one part fail and I don't understand why.
My prediction class is the same as in their tutorial:
%%writefile predictor.py
import os
import pickle
import numpy as…

Hadrien Berthier
- 305
- 1
- 3
- 17
3
votes
1 answer
Cloud ML Engine not working in command line and says it can't find a valid Python Path
I am trying to get prediction from a local model using the gcloud ai-platform command line tool, however I am getting an error "ERROR: (gcloud.ai-platform.local.predict) Something has gone really wrong; we can't find a valid Python executable on…

umar_a
- 53
- 9
3
votes
1 answer
Python ml engine predict: How can I make a googleapiclient.discovery.build persistent?
I need to make online predictions from a model that is deployed in cloud ml engine. My code in python is similar to the one found in the docs (https://cloud.google.com/ml-engine/docs/tensorflow/online-predict):
service =…

Sergio Lobo
- 75
- 1
- 5