DNS doesn't remove not ready pod in AKS with Azure CNI enabled

Question

How does AKS make not ready pod unavailable to accept requests into it? It only works if you have a service in front of that deployment correct?

I'd like to start this off by trying to explain what I had noticed in aks that is not configured with azure cni and then go on to explain what I have been seeing in aks with azure cni enabled.

In AKS without cni enabled if I execute a curl on url on a not ready pod behind a service like this curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 what I get in the response is unresolvable hostname or something like that. Which means in my understanding that DNS doesn't have this entry. This is how in normal way aks handles not ready pods to not receives requests.

In AKS with azure cni enabled if I execute the same request on a not ready pod it is able to resolve the hostname and able to send request into the pod. However, there's one caveat is that when I try to execute a request through external private ip of that service that request doesn't reach the not ready pod which that is to be expected and seems to work right. But again when I try to execute a request like I mentioned above curl -I some-pod.some-service.some-namespace.svc.cluster.local:8080 that works but it shouldn't. Why does DNS in the case of azure cni have that value?

Is there anything I can do to configure azure cni to behave more like a default behavior of AKS where a curl request like that either will not resolve that hostname or will refuse the connection or something?

In general a pod has the following DNS resolution: `pod-ip-address.my-namespace.pod.cluster-domain.example`. For example, if a pod in the `default` namespace has the IP address 172.17.0.3, and the domain name for your cluster is `cluster.local`, then the Pod has a DNS name: `172-17-0-3.default.pod.cluster.local`. Any pods created by a *Deployment* or *DaemonSet* exposed by a Service have the following DNS resolution available: `pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example`. Can you please edit the FQDNs you are trying to reach accordingly? — Srijit_Bose-MSFT, Sep 13 '21 at 08:43

score 0 · Answer 1 · answered Sep 13 '21 at 17:15

Assuming that not ready pod refer to pods with Readiness Probe failing. The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers. [Reference]

However, the logic determining the readiness of the pod might or might not have anything to do with whether the pod can serve requests and depends completely on the user.

For instance with a Pod having the following manifest:

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness
  name: readiness-pod
spec:
  containers:
  - name: readiness-container
    image: nginx
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

readiness is decided based on the existence of the file /tmp/healthy irrespective of whether nginx serves the application. So on running the application and exposing it using a service readiness-svc on k run -:

kubectl exec readiness-pod -- /bin/bash -c 'if [ -f /tmp/healthy ]; then echo "/tmp/healthy file is present";else echo "/tmp/healthy file is absent";fi'
/tmp/healthy file is absent

kubectl get pods -o wide
NAME            READY   STATUS    RESTARTS   AGE    IP            NODE                                NOMINATED NODE   READINESS GATES
readiness-pod   0/1     Running   0          11m    10.240.0.28   aks-nodepool1-29819654-vmss000000   <none>           <none>
source-pod      1/1     Running   0          6h8m   10.240.0.27   aks-nodepool1-29819654-vmss000000   <none>           <none>

kubectl describe svc readiness-svc
Name:              readiness-svc
Namespace:         default
Labels:            test=readiness
Annotations:       <none>
Selector:          test=readiness
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.0.23.194
IPs:               10.0.23.194
Port:              <unset>  80/TCP
TargetPort:        80/TCP
Endpoints:
Session Affinity:  None
Events:            <none>

kubectl exec -it source-pod -- bash
root@source-pod:/# curl -I readiness-svc.default.svc.cluster.local:80
curl: (7) Failed to connect to readiness-svc.default.svc.cluster.local port 80: Connection refused
root@source-pod:/# curl -I 10-240-0-28.default.pod.cluster.local:80
HTTP/1.1 200 OK
Server: nginx/1.21.3
Date: Mon, 13 Sep 2021 14:50:17 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 07 Sep 2021 15:21:03 GMT
Connection: keep-alive
ETag: "6137835f-267"
Accept-Ranges: bytes

Thus, we can see that when we try to connect from source-pod to the service readiness-svc.default.svc.cluster.local on port 80, connection is refused. This is because the kubelet did not find the /tmp/healthy file in the readiness-pod container to perform a cat operation, consequently marking the Pod readiness-pod not ready to serve traffic and removing it from the backend of the Service readiness-svc. However, the nginx server on the pod can still serve a web application and it will continue to do so if you connect directly to the pod.

Readiness probe failures of containers do not remove the DNS records of Pods. The DNS records of a Pod shares its lifespan with the Pod itself.

This behavior is characteristic of Kubernetes and does not change with network plugins. We have attempted to reproduce the issue and have observed same behavior with AKS clusters using kubenet and Azure CNI network plugins.

DNS doesn't remove not ready pod in AKS with Azure CNI enabled

1 Answers1