Questions tagged [google-cloud-ml]

Google Cloud ML Engine is a managed service that offers training and/or prediction services using Machine Learning models.

1007 questions
6
votes
0 answers

Reducing provisioning time of Vertex AI Training (Custom Training Job)

I'm using the Vertex AI custom training feature at Google Cloud Platform (GCP) to train the model. But every time I triggered training, it takes 10 minutes until it actually starts training due to provisioning time. Is there any way to reduce the…
6
votes
1 answer

Authenticating standalone gsutil in containers in Cloud ML Engine on Kubernetes with Workload Identity

I'm launching container images on Google Cloud AI Training (Cloud ML Engine) Inside those containers I need to use gsutil. Some containers have gsutil. In that case I can use it right away without any authentication steps. Some containers do not…
Ark-kun
  • 6,358
  • 2
  • 34
  • 70
6
votes
2 answers

'Server Connection Error' on GCP (AI Platform Notebook)

I am facing some issues with GCP and the AI Platform (Jupyterlab) It seems that I am unable to maintain a stable connection with the server for a long time. I keep getting those 'server connection error' message. From there two…
6
votes
1 answer

Speeding up TFRecords feed into Keras model on CloudML for GPU

I would like to feed TFRecords into my model at a super fast rate. However, currently, my GPU(Single K80 on GCP) is at 0% load which is super slow on CloudML. I have TFRecords in GCS: train_directory = gs://bucket/train/*.tfrecord, (around 100…
GRS
  • 2,807
  • 4
  • 34
  • 72
6
votes
2 answers

Converting google-cloud-ml github Reddit example from regression to classification and adding keys?

I've been trying to adapt the reddit_tft example from the cloud-ml github samples repo to my needs. I've been able to get it running as per the tutorial readme. However what i want to use it for is a binary classification problem and also output…
andrewm4894
  • 1,451
  • 4
  • 17
  • 37
6
votes
2 answers

In Tensorflow for serving a model, what does the serving input function supposed to do exactly

So, I've been struggling to understand what the main task of a serving_input_fn() is when a trained model is exported in Tensorflow for serving purposes. There are some examples online that explain it but I'm having problems defining it for…
6
votes
1 answer

Submitting Google Cloud ML Engine Jobs from Python Directly

I have a Keras .h5 model which I've been training locally, however now wish to automate the full process via the Google Cloud ML-Engine. I have all the GCloud Storage buckets set up to be accessed from the application, and I have read about…
6
votes
2 answers

Base64 images with Keras and Google Cloud ML

I'm predicting image classes using Keras. It works in Google Cloud ML (GCML), but for efficiency need change it to pass base64 strings instead of json array. Related Documentation I can easily run python code to decode a base64 string into json…
user3567174
  • 1,898
  • 2
  • 15
  • 18
6
votes
2 answers

Estimator predict infinite loop

I don't understand how to make a single prediction using TensorFlow Estimator API - my code results in an endless loop that keeps predicting for the same input. According to the documentation, the prediction is supposed to stop when input_fn raises…
traveh
  • 2,700
  • 3
  • 27
  • 44
6
votes
0 answers

Keras model to Tensorflow to input b64 encoded data instead of numpy ml-engine predict

I am trying to convert a keras model to use it for predictions on google cloud's ml-engine. I have a pre-trained classifier that takes in a numpy array as input. The normal working data I send to model.predict is named input_data. I convert it to…
6
votes
1 answer

Deploy retrained inception SavedModel to google cloud ml engine

I am trying to deploy a retrained version of the inception model on google cloud ml-engine. Gathering informations from the SavedModel documentation, this reference, and this post of rhaertel80, I exported successfully my retrained model to a…
EffePi
  • 356
  • 2
  • 13
6
votes
1 answer

Google Cloud ML Tensorflow Version

The docs for setting up Google Cloud ML suggest installing Tensorflow version r0.11. I've observed that TensorFlow functions newly available in r0.12 raise exceptions when run on Cloud ML. Is there a timeline for Cloud ML supporting r0.12? Will…
johnmcs
  • 429
  • 4
  • 10
6
votes
1 answer

Is there anyway Google App Engine apps can communicate or control Machine Learning models or tasks?

I want to use Google's Machine Learning thing with App Engine application written on python. This application should retrain TensorFlow models before every use, because of the investigation nature (data clusterization using Kohonen's SOM). I have…
5
votes
0 answers

Vertex AI Workbench notebooks unresponsive

Having various problems accessing with GCP Vertex AI Workbench managed notebooks. Could really use some suggestions about recovering, and avoiding further failure. The original behavior (two days ago) was After working in the JupyterLab instance…
5
votes
0 answers

Tensorflow - Interpreting the tf.estimator.ProfilerHook "_Send" op

I have a deep CNN/RNN that I train on Google AI platform. I distribute the training on 8 GPUs using the tf.distribute.MirroredStrategy. I recently upgraded my runtime version from 1.13 to 1.15 and my training is more than 2x slower than before. I…
Andy Carlson
  • 3,633
  • 24
  • 43
1 2
3
67 68