Azure Application gateway ingress is an ingress controller for your kubernetes deployment which allows you to use native Azure Application gateway to expose your application to the internet. Its purpose is to route the traffic to pods directly. At the same moment all questions about pods availability, scheduling and generally speaking management is on kubernetes itself.
When a pod receives a command to be terminated it doesn't happen instantly. Right after kube-proxies will update iptables to stop directing traffic to the pod. Also there may be ingress controllers or load balancers forwarding connections directly to the pod (which is the case with an application gateway). It's impossible to solve this issue completely, while adding 5-10 seconds delay can significantly improve users experience.
If you need to terminate or scale down your application, you should consider following steps:
- Wait for a few seconds and then stop accepting connections
- Close all keep-alive connections not in the middle of request
- Wait for all active requests to finish
- Shut down the application completely
Here are exact kubernetes mechanics which will help you to resolve your questions:
preStop hook - this hook is called immediately before a container is terminated. This is very helpful for graceful shutdowns of an application. For example simple sh command with "sleep 5" command in a preStop hook can prevent users to see "Connection refused errors". After the pod receives an API request to be terminated, it takes some time to update iptables and let an application gateway know that this pod is out of service. Since preStop hook is executed prior SIGTERM signal, it will help to resolve this issue.
(example can be found in attach lifecycle event)
readiness probe - this type of probe always runs on the container and defines whether pod is ready to accept and serve requests or not. When container's readiness probe returns success, it means the container can handle requests and it will be added to the endpoints. If a readiness probe fails, a pod is not capable to handle requests and it will be removed from endpoints object. It works very well with newly created pods when an application takes some time to load as well as for already running pods if an application takes some time for processing.
Before removing from the endpoints readiness probe should fail several times. It's possible to lower this amount to only one fail using failureTreshold
field, however it still needs to detect one failed check.
(additional information on how to set it up can be found in configure liveness readiness startup probes)
startup probe - for some applications which require additional time on their first initialisation it can be tricky to set up a readiness probe parameters correctly and not compromise a fast response from the application.
Using failureThreshold * periodSeconds
fields will provide this flexibility.
terminationGracePeriod - is also may be considered if an application requires more than default 30 seconds delay to gracefully shut down (e.g. this is important for stateful applications)