I'm currently running batch predictions on Vertex AI with a custom FastAPI container and manualBatchTuningParameters
set to {"batch_size": 2}
. My JSONL file contains 646 predictions that mostly succeed except for the few that result in the following error:
('Post request fails. Cannot decode the prediction response
...<long and seemingly valid json>...
Error: Unterminated string starting at: line 1 column 97148 (char 97147)', 2)
Based on the common positioning (char 97147) of the character in the error, it seems like the response is being truncated before the stream is completely received by the batch "airflow worker." Given that TCP is a streaming protocol, I believe the batch interface is only receiving a portion of the buffers.
I've attempted to reproduce the error by deploying the same model as a vertex endpoint and requesting the same predictions that errored in batch mode.
Why am I occasionally getting this error?