Questions tagged [google-cloud-ml]

Google Cloud ML Engine is a managed service that offers training and/or prediction services using Machine Learning models.

1007 questions
3
votes
1 answer

Tensorflow — Cannot call `tf.keras.Model.add_metric` when `tf.distribute.MirroredStrategy` is used

I have a model class that inherits from tf.keras.Model. I can train, evaluate, and export it using 8 GPUs, distributing it with tf.distribute.MirroredStrategy. However, I need custom metrics, and when I call the add_metric method, it throws an error…
3
votes
3 answers

TensorFlow model serving on Google AI Platform online prediction too slow with instance batches

I'm trying to deploy a TensorFlow model to Google AI Platform for Online Prediction. I'm having latency and throughput issues. The model runs on my machine in less than 1 second (with only an Intel Core I7 4790K CPU) for a single image. I deployed…
3
votes
0 answers

How to speed up AI platform training job queues?

Whenever I submit a training job to the AI platform, I have to wait around 5-10 minutes for my training job to start after it is queued. This happens when I submit a package for training as well as when I submit a docker image. The logs go something…
3
votes
1 answer

ai-platform: No eval folder or export folder in outputs when running TensorFlow 2.1 training job using Estimators

The Problem My code works locally, but I am not able to get any evaluation data or exports from my TensorFlow estimator when submitting online training jobs after having upgraded to TensorFlow 2.1. Here's the bulk of my code: def…
sleepyowl
  • 168
  • 5
3
votes
1 answer

Accessing Google Secret Manager from AI Platform training job with custom container

I am trying to access a secret stored in Google Secret Manager from an AI Platform Training job that runs in a custom container. I am using the following Python code to retrieve secrets: # Standard library imports import os # Import the Secret…
3
votes
1 answer

PyTorch model deployment in AI Platform

I'm deploying a Pytorch model in Google Cloud AI Platform, I'm getting the following error: ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: Model requires more memory than allowed. Please try to…
gogasca
  • 9,283
  • 6
  • 80
  • 125
3
votes
3 answers

Cannot deploy trained model to Google Cloud Ai-Platform with custom prediction routine: Model requires more memory than allowed

I am trying to deploy a pretrained pytorch model to AI Platform with a custom prediction routine. After following the instructions described here the deployment fails with the following error: ERROR: (gcloud.beta.ai-platform.versions.create) Create…
3
votes
2 answers

MultiWorkerMirroredStrategy() not working on Google AI-Platform (CMLE)

I'm getting the following error while using MultiWorkerMirroredStrategy() for training Custom Estimator on Google AI-Platform (CMLE). ValueError: Unrecognized task_type: 'master', valid task types are: "chief", "worker", "evaluator" and "ps". Both…
3
votes
2 answers

How to write serving input function for Tensorflow model trained without using Estimators?

I have a model trained on a single machine without using Estimator and I'm looking to serve the final trained model on Google cloud AI platform (ML engine). I exported the frozen graph as a SavedModel using SavedModelBuilder and deployed it on the…
3
votes
2 answers

Requirements for launching Google Cloud AI Platform Notebooks with custom docker image

On AI Platform Notebooks, the UI lets you select a custom image to launch. If you do so, you're greeted with an info box saying that the container "must follow certain technical requirements": I assume this means they have a required entrypoint,…
3
votes
2 answers

Unknown Error Sending Data to Google Cloud ML Custom Prediction Routine

I am trying to write a custom ML prediction routine on AI Platform to get text data from a client, do some custom preprocessing, pass it into the model, and run the model. I was able to package and deploy this code on Google cloud successfully.…
hockeybro
  • 981
  • 1
  • 13
  • 41
3
votes
0 answers

Creating json instance for AI Platform from image for custom neural network

I recently created a custom neural network with the following code for basic architecture: def gen_base_model(n_class): cnn_model = InceptionResNetV2(include_top=False, input_shape=(width, width, 3), weights='imagenet') inputs =…
3
votes
1 answer

Gcloud ai-platform, can't create model with own prediction-class

I try following AI Platform tutorial to upload a model and a prediction routine but one part fail and I don't understand why. My prediction class is the same as in their tutorial: %%writefile predictor.py import os import pickle import numpy as…
Hadrien Berthier
  • 305
  • 1
  • 3
  • 17
3
votes
1 answer

Cloud ML Engine not working in command line and says it can't find a valid Python Path

I am trying to get prediction from a local model using the gcloud ai-platform command line tool, however I am getting an error "ERROR: (gcloud.ai-platform.local.predict) Something has gone really wrong; we can't find a valid Python executable on…
umar_a
  • 53
  • 9
3
votes
1 answer

Python ml engine predict: How can I make a googleapiclient.discovery.build persistent?

I need to make online predictions from a model that is deployed in cloud ml engine. My code in python is similar to the one found in the docs (https://cloud.google.com/ml-engine/docs/tensorflow/online-predict): service =…