I have deployed the MNIST dataset on GCP AI models TF serving and facing issues in scaling. I would like to know if someone else has faced a similar issue and ways in which they would have resolved it to scale it.
Behavior
- if I create 3 requests per second the model gives the prediction correctly on single core
- If increase the no of requests to 1000 per second I get either "code": 403, "message": "Request had insufficient authentication scopes.", or javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
On another model version I am very sure it was working with java client and its also working from the GCP test and use UI but has stopped working since I tried scaling with 1000/sec. this is on n1-highmem-2 server. Its giving error
"{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"errors": [
{
"message": "Request contains an invalid argument.",
"domain": "global",
"reason": "badRequest"
}
],
"status": "INVALID_ARGUMENT"
}"
A few questions are be if there is any advantage of tf serving gcp ai for a model vs when deployed on a vm? Thanks for the help