Pytorch on GCP: Machine type is not available on this endpoint

Question

I'm new to GCP so pardon for prehaps asking/missing something obvious here.

I'm trying to deploy and create a version resource on GCP with a custom pytorch model. Everything have been working fine until now until I try to create the a new version of the model. Then I keep getting: INVALID_ARGUMENT: Machine type is not available on this endpoint.

I've tried switching between different types from their list here without luck. What am I missing?

Here's the script I run to deploy:

MODEL_NAME='test_iris'
MODEL_VERSION='v1'
RUNTIME_VERSION='2.4'
MODEL_CLASS='model.PyTorchIrisClassifier'
PYTORCH_PACKAGE='gs://${BUCKET_NAME}/packages/torch-1.8.1+cpu-cp37-cp37m-linux_x86_64.whl'

DIST_PACKAGE='gs://${BUCKET_NAME}/models/Test_model-0.1.tar.gz'
GCS_MODEL_DIR='models/'
REGION="europe-west1"


# Creating model on AI platform
gcloud alpha ai-platform models create ${MODEL_NAME}\
--region=europe-west1 --enable-logging \
--enable-console-logging

gcloud beta ai-platform versions create ${MODEL_VERSION} --model=${MODEL_NAME} \
    --origin=gs://${BUCKET_NAME}/${GCS_MODEL_DIR} \
    --python-version=3.7 \
    --machine-type=mls1-c4-m2\
    --runtime-version=${RUNTIME_VERSION} \
    --package-uris=${DIST_PACKAGE},${PYTORCH_PACKAGE} \
    --prediction-class=${MODEL_CLASS}

Thanks!

Hi, since you specified an endpoint for creating your model `--region=europe-west1`, you also should specify the same endpoint when creating a version of your model with the flag `--region=europe-west1`, [here](https://cloud.google.com/ai-platform/prediction/docs/deploying-models#create_a_model_version). Also, it is written in the [documentation](https://cloud.google.com/ai-platform/prediction/docs/regional-endpoints#create-model-version) that, you can only omit the `--region` flag from `versions create` command when the model was created in a global endpoint. Please, let me know if it worked. — Alexandre Moraes, May 05 '21 at 11:59
Thanks for getting back, I tried to set the region on both model and versions create without luck, but setting the region to global solved it. Now I get this error, found multiple open treads for this, any idea? ``` ERROR: Create Version failed. Bad model detected with error: "Failed to load model: User-provided package Test_model-0. 1.tar.gz failed to install: Command '['python-default', '-m', 'pip', 'install', '--target=/tmp/custom_lib', '--no-cache-dir', '-b', '/tmp/pip_builds', '/tmp/custo m_code/Test_model-0.1.tar.gz']' returned non-zero exit status 1 (Error code: 0)" ``` — Jh123, May 05 '21 at 13:33
Could you please post another question so we address one issue at a time? You can send the link to the question here. Also, in order to further contribute to the community I will post an answer based on the information I shared. I would appreciate if you can accept and upvote it. — Alexandre Moraes, May 06 '21 at 06:58

Alexandre Moraes · Accepted Answer · 2021-05-06T07:33:34.227

According to the documentation, you can only deploy a Custom prediction routine when using a legacy (MLS1) machine type for your model version. However, you can not use a regional endpoint with this type of machine, as stated here,

Regional endpoints only support Compute Engine (N1) machine types. You cannot use legacy (MLS1) machine types on regional endpoints.

As I can see, you have specified a regional endpoint with the --region flag, which does not support the machine type you required for your use case. Thus, you need to change the model and its version to a global endpoint, so you won't face the error anymore.

In addition, when you specify a regional endpoint within gcloud create model --region, you need to specify the same region when creating the model's version. On the other hand, when creating a model in the global endpoint gcloud create model --regions, you can omit the region flag in the command gcloud ai-platform versions create. Note that the --regions command is used only for the global endpoint

Lastly, I must point out that, as per documentation, when selecting a region for the global endpoint, using the --regions flag when creating the model, your prediction nodes run in the specified region. Although, the AI Platform Prediction infrastructure managing your resources might not necessarily run in the same region.

@Jh123, please if you found the information useful, I would appreciate if you could accept and upvote the answer above. — Alexandre Moraes, May 06 '21 at 07:27

Pytorch on GCP: Machine type is not available on this endpoint

1 Answers1