0

I have a game app, which is stateful, and it is deployed on Cloud Foundry. If I update the app, I need shutdown gracefully, i.e. the old version should not be stopped until all running games are finished. According to the CF docs, when issuing cf stop, the app is only given 10 seconds to shut down after the SIGTERM is sent, before the app gets killed using SIGKILL. This is not working in my case.

I thought about pushing state into sth like a DB or Redis, and then hot-swap the running games to the new app instance, but since I am heavily using Web Sockets, this seems to create more problems than it solves, cause it would also break the existing connections.

Another solution would be to not send a cf stop, but instead add an operations endpoint to my app, like POST /api/admin/stop, which makes the app stop accepting new games, and then shutdown itself after all running games have finished.

I third option could be to change the design entirely and use sth like WebRTC as protocol, which means the app will only server the static resources, but has no active role any more in running games, cause all clients then connect to each other directly instead of through the server. I am not experienced in WebRTC though and wonder whether the solution works reliably, e.g. if some users use a VPN.

Right now I am in favor of the second option. But is it supposed to be that way that a CF app can terminate itself? And if yes, how to do it cleanly?

Or are there any other options? What's the best solution?

xiaoye
  • 58
  • 5

1 Answers1

0

According to the CF docs, when issuing cf stop, the app is only given 10 seconds to shut down after the SIGTERM is sent, before the app gets killed using SIGKILL. This is not working in my case.

This is the expected behavior. If you're not seeing that, you'd want to look into things more closely. You didn't mention the language/frameworks you're using, and some languages make it easier and some make it harder to handle signals (some even handle them by default).

I'd suggest the following:

  1. You'd want to look into the specifics of your application's language and see how to handle signals. Makes sure that you're implementing that correctly in your application. Even in languages that handle signals automatically, you will still need to tie into some sort of shutdown hook so that you can execute your custom shutdown code and save your state.

  2. You need to catch the signal and handle it as quickly as possible. You only get 10s before that SIGKILL is sent and then your app terminates.

  3. I would suggest writing a trivial sample app that handles signals, you can log when the signal is caught and play around with pausing for different durations after to see how things are handled by the platform. This is also useful because 10s is just the default value for how long the platform will wait. Your platform operator can change that value, so it could potentially be shorter or longer. Doing a test like this would allow you to test and find the exact value set by your operator.

I thought about pushing state into sth like a DB or Redis, and then hot-swap the running games to the new app instance, but since I am heavily using Web Sockets, this seems to create more problems than it solves, cause it would also break the existing connections.

I don't have a lot of comments on your app's architecture as I don't know enough about what you're doing to make educated comments besides the following note.

When running on CF, you want to try and utilize services for as much state as you possibly can. Don't write to the local disk & try to make sure you have things like session state stored in a durable cache (like Redis).

Application instances can be shut down for a number of reasons. One of those is if you run cf stop, but they can also be shut down if your platform operations team is doing updates or if a Diego Cell (where your app containers live) crashes. Some of the reasons your app will be shut down are not things you can control, so you have to architect with that in mind.

The key things to keep in mind are a.) if you have multiple instances, the platform will guarantee that you always have at least one running and b.) given a.) you need to be able to balance requests across multiple app instances. If you can manage those two items, your app should run pretty well on CF.

Daniel Mikusa
  • 13,716
  • 1
  • 22
  • 28