Why do I occasionally get 'Post request fails. Cannot decode the prediction response' when running Vertex AI batch predictions?

Question

I'm currently running batch predictions on Vertex AI with a custom FastAPI container and manualBatchTuningParameters set to {"batch_size": 2}. My JSONL file contains 646 predictions that mostly succeed except for the few that result in the following error:

('Post request fails. Cannot decode the prediction response 
...<long and seemingly valid json>... 
Error: Unterminated string starting at: line 1 column 97148 (char 97147)', 2)

Based on the common positioning (char 97147) of the character in the error, it seems like the response is being truncated before the stream is completely received by the batch "airflow worker." Given that TCP is a streaming protocol, I believe the batch interface is only receiving a portion of the buffers.

I've attempted to reproduce the error by deploying the same model as a vertex endpoint and requesting the same predictions that errored in batch mode.

Why am I occasionally getting this error?

Is there a set of inputs that always return a failure? If yes, maybe we can narrow down the issue by looking at just those inputs. Maybe send a prediction request via direct REST and see what is returned directly. — Kolban, May 11 '22 at 02:55
I successfully reran the batch prediction request without error using same inputs and model. This is the third time I've ran this particular set of inputs and this is the first run without error. Also, all of these batch prediction requests were made via the API as the UI and python library don't seem to have the ability to set the batch_size parameter. — Imjohsep, May 11 '22 at 17:02

Why do I occasionally get 'Post request fails. Cannot decode the prediction response' when running Vertex AI batch predictions?

0 Answers0