3

I have a couple of Spring Boot services that work perfectly locally but they restart after random time on Google App Engine Flexible. These services use google Cloud SQL and Pub/Sub with help of the Spring Boot libraries.

When I deploy the services they work fine, but after a while they get restarted by App Engine. I hooked them up to an instance of Spring Boot admin and I can't see anything strange in the heap or disk space or any of these. Sometimes they reboot after a few hours, sometimes faster.

I tried bumping the logging to trace logging and it seems the services get restarted faster.

I also noticed that the healthcheck gets called a lot, even though the default configuration of the healthcheck says it should be 5 mins and a couple of failed consecutive healthchecks google app yaml config. But I never see any failed healthchecks in the logs.

The thing I see is that the healthchecks go well (200 reply), and then suddenly the logging stops, and a couple of minutes later i see

Start command: java -showversion -agentpath:/opt/cdbg/cdbg_java_agent.so=--log_dir

which means that app engine is trying to start the app again.

The app.yaml looks like: runtime: java env: flex service: x-service resources: memory_gb: 1.0 automatic_scaling: min_num_instances: 1 max_num_instances: 2 env_variables: SPRING_PROFILES_ACTIVE: "dev" liveness_check: path: "/actuator/health" readiness_check: path: "/actuator/health"

1 Answers1

0

So after doing some research, i might be helpful to your problem.

I also noticed that the healthcheck gets called a lot, even though the default configuration of the healthcheck says it should be 5 mins and a couple of failed consecutive healthchecks google app yaml config. But I never see any failed healthchecks in the logs.

This is a normal behavior because of Google redundant health checkers:

" Health check frequency

To ensure high availability, App Engine creates redundant copies of each health checker. If a health checker fails, a redundant one can take over with no delay.

If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them."

I have a couple of Spring Boot services that work perfectly locally but they restart after random time on Google App Engine Flexible. These services use google Cloud SQL and Pub/Sub with help of the Spring Boot libraries.

When I deploy the services they work fine, but after a while they get restarted by App Engine. I hooked them up to an instance of Spring Boot admin and I can't see anything strange in the heap or disk space or any of these. Sometimes they reboot after a few hours, sometimes faster.

Looking at the way that GAE manages instances this might be a normal behavior as long as your application keeps responding to requests. By using automatic_scaling you're defining dynamic instances. Dynamic instances are turned on/off depending on the load they are receiving. So, what you might be seeing is auto-scaling at work. Going to 2 instances then back to one and so on.

i would invite you to test this out by either increasing the default threshold of target_utilization to something like 0.9 and see if it scales frequently. Or just using manual_scaling so you only have resident instances. The reason for testing with this is because, the logs you're seeing might be the expected behavior of app engine instance management. since your app is responding OK to the health checks and readiness, and you mention that the memory utilization shows nothing out of the norm, then i can't think of anything else that might be causing that besides the auto-scaling functionality.

I hope this helps!

Sources:

https://cloud.google.com/appengine/docs/flexible/java/reference/app-yaml https://cloud.google.com/appengine/docs/flexible/custom-runtimes/configuring-your-app-with-app-yaml https://cloud.google.com/appengine/docs/flexible/java/how-instances-are-managed

Community
  • 1
  • 1
sanchedale
  • 166
  • 1
  • 7
  • Hi,the service doesn't seem to keep responding to request. we even changed the settings to have max and min instance 1 and then the service also goes down. even with min instance 1 and max instance 2 the service seems to go down. When we look in stackdriver the only cpu spikes we see is during startup. – Geert Olaerts Nov 12 '18 at 08:04
  • And everything seems fine in your application logs? – sanchedale Nov 12 '18 at 08:13
  • yep even put it in trace logging and nothing special or errors that we could see. – Geert Olaerts Nov 13 '18 at 09:27
  • @GeertOlaerts - Did you ever discover what was causing this behavior? I'm experiencing the same issue currently, and am on the verge of moving this service to an ec2 instance specifically to avoid dealing with these automatic restarts. If automatic restarts are an inescapable aspect of using containers in Google App Engine, even with 1 pod min/max explicitly defined and no healthcheck downtime, then this is important information. – Robert Christ Sep 22 '21 at 15:01