2

From what I've read about Kubernetes, if the master(s) die, the workers should still be able to function as normal (https://stackoverflow.com/a/39173007/281469), although no new scheduling will occur.

However, I've found this to not be the case when the master can also schedule worker pods. Take a 2-node cluster, where one node is a master and the other a worker, and the master has the taints removed:

diagram

If I shut down the master and docker exec into one of the containers on the worker I can see that:

nc -zv ip-of-pod 80

succeeds, but

nc -zv ip-of-service 80

fails half of the time. The Kubernetes version is v1.15.10, using iptables mode for kube-proxy.

I'm guessing that since the kube-proxy on the worker node can't connect to the apiserver, it will not remove the master node from the iptables rules.

Questions:

  1. Is it expected behaviour that kube-proxy won't stop routing to pods on master nodes, or is there something "broken"?
  2. Are any workarounds available for this kind of setup to allow the worker nodes to still function correctly?

I realise the best thing to do is separate the CP nodes but that's not viable for what I'm working on at the moment.

bcoughlan
  • 25,987
  • 18
  • 90
  • 141
  • I did a write-up about the nature of this problem in the end https://bcoughlan.github.io/posts/kubernetes-master-fail/ – bcoughlan Jul 25 '23 at 09:40

3 Answers3

3

Is it expected behaviour that kube-proxy won't stop routing to pods on master nodes, or is there something "broken"?

Are any workarounds available for this kind of setup to allow the worker nodes to still function correctly?

The cluster master plays the role of decision maker for the various activities in cluster's nodes. This can include scheduling workloads, managing the workloads' lifecycle, scaling etc.. Each node is managed by the master components and contains the services necessary to run pods. The services on a node typically includes the kube-proxy, container runtime and kubelet.

The kube-proxy component enforces network rules on nodes and helps kubernetes in managing the connectivity among Pods and Services. Also, the kube-proxy, acts as an egress-based load-balancing controller which keeps monitoring the the kubernetes API server and continually updates node's iptables subsystem based on it.

In simple terms, the master node only is aware of everything and is in charge of creating the list of routing rules as well based on node addition or deletion etc. kube-proxy plays a kind of enforcer whereby it takes charge of checking with master, syncing the information and enforcing the rules on the list.

If the master node(API server) is down, the cluster will not be able to respond to API commands or deploy nodes. If another master node is not available, there shall be no one else available who can instruct the worker nodes on change in work allocation and hence they shall continue to execute the operations that were earlier scheduled by the master until the time the master node is back and gives different instructions. Inline to it, kube-proxy shall also be unable to get the latest rules by sync up with master, however it shall not stop routing and shall continue to handle the networking and routing functionalities (uses the earlier iptable rules that were determined before the master node went down) that shall allow network communication to your pods provided all pods in worker nodes are still up and running.

Single master node based architecture is not a preferred deployment architecture for production. Considering that resilience and reliability is one of the major business goal of kubernetes, it is recommended as a best practice to have HA cluster based architecture to avoid single point of failure.

Karthik Balaguru
  • 7,424
  • 7
  • 48
  • 65
2

Once you remove taints, kubernetes scheduler don't need any tolerations to schedule pods on your master node. So it is as good as your worker node with control plane components running on it and you can also run your workload pods on this node (although its not a recommended practice).

Kube-proxy (https://kubernetes.io/docs/concepts/overview/components/#kube-proxy) is the component deployed on all the nodes of cluster and it handles the networking and routing connection to your pods. So, even if your master node is down kube-proxy still works fine on the worker node and it will route traffic to your pods running on worker node.

If all your pods are running in worker nodes (which are still up and running), then kube-proxy will continue to route traffic to your pods even via service.

Anmol Agrawal
  • 814
  • 4
  • 6
  • Is this still true in iptables mode? How does kube-proxy know to remove the node if the master is down? "If kube-proxy is running in iptables mode and the first Pod that’s selected does not respond, the connection fails. This is different from userspace mode: in that scenario, kube-proxy would detect that the connection to the first Pod had failed and would automatically retry with a different backend Pod." https://kubernetes.io/docs/concepts/services-networking/service/ – bcoughlan Feb 15 '20 at 23:24
1

There is nothing inherent in Kubernetes that would cause this. The master node role is just for humans, and if you've removed the taints then the nodes are just normal nodes. That said, remember that usual rules about scheduling and resource requests apply so if your pods don't all fit then things wouldn't be scheduled. It's possible your Kubernetes deploy system set up more specialized firewall rules or similar around the control plane nodes, but that would be dependent on that system.

coderanger
  • 52,400
  • 4
  • 52
  • 75