Questions tagged [gcp-ai-platform-training]

143 questions
12
votes
2 answers

How can I mount a GCS bucket in a custom Docker image on AI Platform?

I'm using Google's AI Platform to train machine learning models using a custom Docker image. To run existing code without modifications, I would like to mount a GCS bucket inside the container. I think one way to achieve this is to install gcloud to…
7
votes
2 answers

How to make Google Cloud AI Platform detect `tf.summary.scalar` calls during training?

(Note: I have also asked this question here) Problem I have been trying to get Google Cloud's AI platform to display the accuracy of a Keras model, trained on the AI platform. I configured the hyperparameter tuning with hptuning_config.yaml and it…
6
votes
0 answers

GCP AI Platform job is stuck

I'm running a job on AI Platform and it's running for over an hour with no progress, no results, no logs(only few logs showing it's running) Here is the region, machine type, gpus I was using: "region": "us-central1", "runtimeVersion": "2.2", …
5
votes
1 answer

Training using object detection api is not running on GPUs in AI Platform

I am trying to run the training of some models in tensorflow 2 object detection api. I am using this command: gcloud ai-platform jobs submit training segmentation_maskrcnn_`date +%m_%d_%Y_%H_%M_%S` \ --runtime-version 2.1 \ --python-version…
5
votes
0 answers

Tensorflow - Interpreting the tf.estimator.ProfilerHook "_Send" op

I have a deep CNN/RNN that I train on Google AI platform. I distribute the training on 8 GPUs using the tf.distribute.MirroredStrategy. I recently upgraded my runtime version from 1.13 to 1.15 and my training is more than 2x slower than before. I…
Andy Carlson
  • 3,633
  • 24
  • 43
4
votes
2 answers

Is it possible to connect to the private IP of a Cloud SQL instance in GCP Vertex AI pipeline?

I am working on a pipeline to perform data transformation of data residing on GCP cloud SQL with private IP on a different project by using the Vertex AI pipeline. I am not able to get any documents regarding connecting to Cloud SQL in it. Does…
4
votes
3 answers

gcloud project owner permission denied

So I'm trying to run a training job on google cloud's AI-platform for an image classifier written in tensorflow by the command line: gcloud ai-platform jobs submit training my_job \ --module-name trainer.final_task \ …
3
votes
2 answers

GCP Vertex AI Endpoint returning empty prediction array

KFP pipeline job executes successfully, but upon hitting the endpoint, am getting an empty predictions array ([]). I suspect the issue is in the model upload, where the model is not registered correctly somehow. Any tips are appreciated. Code to…
3
votes
2 answers

Severities of all logs on AI Platform are errors

On Google AI Platform, all logs printed on stderr are interpreted as ERROR. Is there any way to print logs as INFO, WARNING, and CRITICAL?
3
votes
0 answers

How to speed up AI platform training job queues?

Whenever I submit a training job to the AI platform, I have to wait around 5-10 minutes for my training job to start after it is queued. This happens when I submit a package for training as well as when I submit a docker image. The logs go something…
3
votes
1 answer

How to use pandas-gbq with BigQuery Storage API within AI platform training?

I'm submitting a training job to the GCP AI platform training service. My training dataset (around 40M rows on a BigQuery table in the same GCP project) needs to be preprocessed at the beginning of the training job as a pandas dataframe, so I tried…
3
votes
2 answers

How do you schedule GCP AI Platform notebooks via Google Cloud Composer?

I've been tasked with automating the scheduling of some notebooks that are run daily that are on AI Platform notebooks via the Papermill operator, but actually doing this through Cloud Composer is giving me some troubles. Any help is appreciated!
3
votes
3 answers

Cannot deploy trained model to Google Cloud Ai-Platform with custom prediction routine: Model requires more memory than allowed

I am trying to deploy a pretrained pytorch model to AI Platform with a custom prediction routine. After following the instructions described here the deployment fails with the following error: ERROR: (gcloud.beta.ai-platform.versions.create) Create…
3
votes
2 answers

MultiWorkerMirroredStrategy() not working on Google AI-Platform (CMLE)

I'm getting the following error while using MultiWorkerMirroredStrategy() for training Custom Estimator on Google AI-Platform (CMLE). ValueError: Unrecognized task_type: 'master', valid task types are: "chief", "worker", "evaluator" and "ps". Both…
2
votes
1 answer

How to build custom pipeline in GCP using Vertex AI

I was exploring the vertex AI AutoML feature in GCP, which lets users import datasets, train, deploy and predict ML models. My use case is to do the data pre-processing on my own (I didn't get satisfied with AutoML data preprocessing) and want to…
1
2 3
9 10