I'm using Google AutoML Translation with custom models. It works fine but there is one issue that bothers me.
If I do not use language pair (e.g. English-Spanish) for a while there is a cold start and the request takes more than 10 seconds. Whereas the next call is very quick and it takes around 0.5s. In my application, I support many language pairs and the issue appears to each language pairs.
I didn't find any information about the cold start or readiness probes in the documentation.
The question is: Is it possible to somehow avoid a cold start?