0

I have my application running in GAE. This application makes REST call to my CloudML.

Here is the code for that

        GoogleCredential credential = GoogleCredential.getApplicationDefault()
                .createScoped(Collections.singleton(CLOUDML_SCOPE));
        HttpTransport httpTransport = GoogleNetHttpTransport.newTrustedTransport();
        HttpRequestInitializer requestInitializer = request -> {
            credential.initialize(request);
            request.setReadTimeout(0);
        };

        HttpRequestFactory requestFactory = httpTransport.createRequestFactory(
                requestInitializer);

        GenericUrl url = new GenericUrl(predictRestUrl);

        JacksonFactory jacksonFactory = new JacksonFactory();
        JsonHttpContent jsonHttpContent = new JsonHttpContent(jacksonFactory, getPayLoad());

        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        jsonHttpContent.setWrapperKey("instances");
        jsonHttpContent.writeTo(baos);
        LOG.info("Executing request... " + baos.toString());
        HttpRequest request = requestFactory.buildPostRequest(url, jsonHttpContent);

        HttpResponse response = request.execute();

I've set the ReadTimeOut to 0 as I often get read timeout exception.

Now with this code I am often getting the below error response from CloudML

com.google.api.client.http.HttpResponseException: 500 Internal Server Error
{
  "error": {
    "code": 500,
    "message": "Internal error encountered.",
    "errors": [
      {
        "message": "Internal error encountered.",
        "domain": "global",
        "reason": "backendError"
      }
    ],
    "status": "INTERNAL"
  }
}

Where we can get the logs for REST call to CloudML? How to debug this further?

sag
  • 5,333
  • 8
  • 54
  • 91

1 Answers1

0

We worked with @sag and determined that the 500 error is a result of a timeout due to a long "cold start". If you haven't sent traffic to your model in a while, or if you send enough that we need to spin more instances up, you will hit a "cold start", where one or more instances are spun up. Currently, this can be a lengthy process that sometimes times out on our end and a 500 error may result.

These errors can safely be retried; we recommend using exponential backoff.

rhaertel80
  • 8,254
  • 1
  • 31
  • 47