65

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?

According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?

David Newswanger
  • 1,073
  • 1
  • 10
  • 11

3 Answers3

83

In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.

In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:

  1. Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
  2. At least one API server can service requests

In a partially degraded state, the API server may be able to respond to requests that only read data.

However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.

There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.

If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).

pnovotnak
  • 4,341
  • 2
  • 27
  • 38
  • 6
    Have brought down master many times without bringing down the running apps. – manojlds Aug 26 '16 at 19:58
  • 4
    Thanks for the information! Just out of curiosity, how does Kubernetes route requests among nodes in the cluster? How do requests enter the cluster if there is no central machine to point a DNS entry to? – David Newswanger Aug 26 '16 at 20:13
  • Networking works differently depending on where your resources are deployed, but typically the thing that routes traffic to a service in the cluster is a load balancer that your cloud provider provides as a service. Therefore, packets are routed to the cluster independent of any kubernetes component's operation. In the case of a NodePort, the public IP of any node provides an entrypoint for traffic. From there, routing becomes a little more complicated; http://kubernetes.io/docs/admin/networking/ – pnovotnak Aug 26 '16 at 20:56
  • 6
    As for DNS, the DNS services run in a pod on each node--["The running Kubernetes DNS pod holds 3 containers - kubedns, dnsmasq and a health check called healthz. The kubedns process watches the Kubernetes master for changes in Services and Endpoints, and maintains in-memory lookup structures to service DNS requests..."](http://kubernetes.io/docs/admin/dns/#how-it-works) So, it appears DNS entries will continue to resolve during this time, but rebooting a node will cause DNS to stop resolving on that node. – pnovotnak Aug 26 '16 at 20:59
  • 1
    Note that OpenShift is slightly different than Kubernetes for DNS and for masters - the default configuration is HA masters and DNS running on the master process. Failures of all three masters would be required for DNS to fail to resolve, and a majority of etcd instances have to be down. – Clayton Aug 27 '16 at 00:00
  • One case where apps won't keep functioning is where the master node runs worker pods (i.e. by removing the taints). kube-proxy will keep routing `Service` traffic to that node (via iptables rules), causing a fraction of your Service requests to fail, because it relies on the master to tell it that the node is not up. – bcoughlan Feb 17 '20 at 14:27
  • What if I use `--type="ClusterIP"`. Is it correct that the requests are routed through the master node? And then, if the master node goes down, the applications running on the worker nodes will be useless? Because the users' requests won't be able to reach the workers. Right? @pnovotnak – steoiatsl Jun 13 '21 at 20:13
  • No, traffic wouldn't normally flow through the master(s). Traffic to service IPs will flow through the kube-proxy process on a node (by default, some CNIs supplant kube-proxy), which load balances across pod endpoints. Those pod endpoints are usually (again, probably depends on your specific configuration) routable by the kernel. Details aside, I've never heard of a configuration where traffic flows through masters. A cluster with that configuration would have poor scaling properties. PS: I assume that kube-proxy will not work well if it is restarted while the masters are down. – pnovotnak Jun 15 '21 at 06:14
  • Sorry, bear with me. I have a configuration with a master and a worker. This is how I set it up: `kubectl expose deployment/kubernetes-bootcamp --type="ClusterIP" --port 8080`, `kubectl port-forward --address 0.0.0.0 service/kubernetes-bootcamp 8080:8080 > /dev/null 2>&1 &`. I can access the application by doing `curl localhost:8080` on the master, or by going to `http://ip.of.master:8080`. So if the master node goes down, how can the user access the application? The master here is a single point of failure right? (Although worker(s) are still running) @pnovotnak – steoiatsl Jun 16 '21 at 23:05
  • 1
    In this case, I believe you are correct. I've noted that `kubectl proxy` cannot be used for testing for this reason, as it routes traffic through the API server – pnovotnak Jun 22 '21 at 20:42
23

Kubernetes cluster without a master is like a company running without a Manager.

No one else can instruct the workers(k8s components) other than the Manager(master node)
(even you, the owner of the cluster, can only instruct the Manager)

Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)

As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.

The best practice is to assign multiple managers(master) to your cluster.

AATHITH RAJENDRAN
  • 4,689
  • 8
  • 34
  • 58
2

Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.

Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.

enter image description here

One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.

If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).

If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.

In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.

Shailendra
  • 8,874
  • 2
  • 28
  • 37