I have deployed a model based on Random Forest in Google Cloud Platform (machine learning engine) to get an online prediction. It takes normally around 15s to get the response (with only one instance) when I call the API using the APIs using Google API client library. When I try with around 15 instances the time to get the response to increase up to more than 2 min. When this problem appeared the first time we configure the minNodes to 1, to ensure that always a node was ready. However, this not resolved my issue.
I have tried to find a solution in Google documentation, but this is not really clear. For example, I have activated the console logs ( --enable-console-logging) but I can't see any logs.
I would appreciate any suggestion to improve my model.
Here commands used to deploy the model:
gcloud beta ai-platform models create $MODEL_NAME \
--regions $REGION \
--enable-console-logging
gcloud alpha ai-platform versions create $VERSION_NAME \
--log-http \
--model $MODEL_NAME \
--runtime-version 1.15 \
--python-version "3.7" \
--origin gs:/$patch_custom/ \
--package-uris gs:/$patch_custom/ML_MODEL_RAR-0.0.tar.gz \
--prediction-class AQ_Service.predictor.rf_predictor \
--machine-type mls1-c4-m4 \
--config CONFIG.YAML
*/
CONFIG.YAML:
autoScaling:
minNodes: 1 # The minimum number of nodes to allocate for this model.
Note It uses machine type mls1-c4-m4 because this was the only way to resolve problems with the size of the model without use other machine that only support TensorFlow.
Similar issue: High latency issue of online prediction