Reaching batch prediction quota limit when not submitting that many batch predictions

Question

I'm using Vertex AI batch predictions using a custom XGBoost model with Explainable AI using Shapley values.

The explanation part is quite computationally intensive so I've tried to split up the input dataset into chunks and submit 5 batch prediction jobs in parallel. When I do this I receive a "Quota exhausted. Please reach to ai-platform-unified-feedback@google.com for batch prediction quota increase".

I don't understand why I'm hitting the quota. According to the docs there is a limit on the number of concurrent jobs for AutoML models but it doesn't mention custom models.

Is the quota perhaps on the number of instances the batch predictions are running on? I'm using a n1-standard-8 instance for my predictions.

I've tried changing the instance type and launching fewer jobs in parallel but still getting the same error.

could you specify the region and number of instances you're using for your custom model? — Sakshi Gatyan, Dec 06 '22 at 13:54

score 0 · Answer 1 · answered Dec 09 '22 at 12:32

According to the Google documentation of Vertex AI, for Custom models the quota is on the number of concurrent machines that are running in the specified region. You can request for a quota increase following the information mentioned in the error message.

For more information on custom-trained model quotas refer to this documentation.

score 0 · Accepted Answer · answered Jan 04 '23 at 08:46

After reaching out to Google support regarding this issue, it was explained to me that the quota is based on the number of vCPUs used in the batch prediction job. The formula to calculate this is:

the number of vCPUs in a machine X number of machines ( X 3 if explanations are enabled because a separate node is spun up in this case which requires additional resources)

For example if using 50 e2-standard-4 machines to a run batch prediction with explanations results in 50 * 4 * 3 = 600 vCPUs in total being used.

The default quota for a Google project is 2,200 vCPUs for the europe-west2 region. Moreover, this limit is not visible in the user's Google project, but instead in a hidden project only visible to Google engineers. Thus, it is required to raise a support ticket if you need the quota to be increased.

Reaching batch prediction quota limit when not submitting that many batch predictions

2 Answers2