Google Cloud Platform - High latency on online prediction

Question

I have deployed a model based on Random Forest in Google Cloud Platform (machine learning engine) to get an online prediction. It takes normally around 15s to get the response (with only one instance) when I call the API using the APIs using Google API client library. When I try with around 15 instances the time to get the response to increase up to more than 2 min. When this problem appeared the first time we configure the minNodes to 1, to ensure that always a node was ready. However, this not resolved my issue.

I have tried to find a solution in Google documentation, but this is not really clear. For example, I have activated the console logs ( --enable-console-logging) but I can't see any logs.

I would appreciate any suggestion to improve my model.

Here commands used to deploy the model:

gcloud beta ai-platform models create $MODEL_NAME \
  --regions $REGION \
  --enable-console-logging

gcloud alpha ai-platform versions create $VERSION_NAME \
  --log-http \
  --model $MODEL_NAME \
  --runtime-version 1.15 \
  --python-version "3.7" \
  --origin gs:/$patch_custom/ \
  --package-uris gs:/$patch_custom/ML_MODEL_RAR-0.0.tar.gz \
  --prediction-class AQ_Service.predictor.rf_predictor \
  --machine-type mls1-c4-m4 \
  --config CONFIG.YAML
  */

CONFIG.YAML:

  autoScaling:
    minNodes: 1  # The minimum number of nodes to allocate for this model.

Note It uses machine type mls1-c4-m4 because this was the only way to resolve problems with the size of the model without use other machine that only support TensorFlow.

Similar issue: High latency issue of online prediction

To visualize the logs, in StackDriver > Logging > Select Global and your Log name, if you want to see Model logs you should be able to select Cloud ML Model version. Check this post: https://stackoverflow.com/questions/60163113/how-to-log-from-a-custom-ai-platform-model/60308285#60308285, what Framework are you using? Have you tried locally with same model and confirm you do get good results? — gogasca, Mar 04 '20 at 07:49
Thanks in deep, now I can see the logs. I have replicated the problems. With one instance is works fine, however with around 10 instances at the same time it take around 2 min. Yes, I have tested it locally and works fine. What type of logs should inform about this problem? It doesn't report any critical error, only info type. — H3PO4, Mar 05 '20 at 15:14

score 0 · Answer 1 · answered Dec 10 '20 at 23:00

H3P04

On the surface, hard to say how much the latency is HTTP request/response and how much is the model doing a prediction. If HTTP part was 0, then you could expect 15s * 15 = 3.75 minutes. So it seems at 2 mins your doing okay.

You could reduce the latency further by using faster compute HW. Here's a link to Google's documentation: https://cloud.google.com/ai-platform/prediction/docs/machine-types-online-prediction

Google Cloud Platform - High latency on online prediction

1 Answers1