kubelet.service: Unit entered failed state in not ready state node error from kubernetes cluster

Question

I am trying to deploy an springboot microservices in kubernetes cluster having 1 master and 2 worker node. When I am trying to get the node state using the command sudo kubectl get nodes, I am getting one of my worker node is not ready. It showing not ready in status.

When I am applying to troubleshoot the following command,

sudo journalctl -u kubelet

I am getting response like kubelet.service: Unit entered failed state and kubelet service stopped. The following is the response what I am getting when applying the command sudo journalctl -u kubelet.

-- Logs begin at Fri 2020-01-03 04:56:18 EST, end at Fri 2020-01-03 05:32:47 EST. --
Jan 03 04:56:25 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.053962     970 server.go:416] Version: v1.17.0
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.084061     970 plugins.go:100] No cloud provider specified.
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.235928     970 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.280173     970 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curre
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: I0103 04:56:38.107966     970 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: F0103 04:56:38.109401     970 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable swa
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.901632    1433 server.go:416] Version: v1.17.0
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907654    1433 plugins.go:100] No cloud provider specified.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907806    1433 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.947107    1433 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: I0103 04:56:49.263777    1433 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: F0103 04:56:49.264219    1433 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.712729    1500 server.go:416] Version: v1.17.0
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.714927    1500 plugins.go:100] No cloud provider specified.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.715248    1500 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.763508    1500 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.956706    1500 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: F0103 04:56:59.957078    1500 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.

log file: service: Unit entered failed state

I tried by restarting the kubelet. But still there is no change in node state. Not ready state only.

Updates

When I am trying the command systemctl list-units --type=swap --state=active , then I am getting the following response,

docker@MILDEVKUB040:~$ systemctl list-units --type=swap --state=active
UNIT                                            LOAD   ACTIVE SUB    DESCRIPTION
dev-mapper-MILDEVDCR01\x2d\x2dvg\x2dswap_1.swap loaded active active /dev/mapper/MILDEVDCR01--vg-swap_1

Important

When I am getting these kind of issue with node not ready, each time I need to disable the swap and need to reload the daemon and kubelet. After that node becomes ready state. And again I need to repeat the same.

How can I find a permanent solution for this?

what is the OS of the master and node. and how did you install kubernetes? did you use kubeadm? — P Ekambaram, Jan 06 '20 at 08:41
I am using Ubuntu 16.04 and kubeadm for kubernetes installation. — Mr.DevEng, Jan 06 '20 at 09:13
share output from cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and 'docker info | grep Driver' — P Ekambaram, Jan 06 '20 at 09:30

Shashank V · Answer 1 · 2020-01-06T09:17:47.343

5

failed to run Kubelet: running with swap on is not supported, please disable swap

You need to disable swap on the system for kubelet to work. You can disable swap with sudo swapoff -a

For systemd based systems, there is another way of enabling swap partitions using swap units which gets enabled whenever systemd reloads even if you have turned off swap using swapoff -a

https://www.freedesktop.org/software/systemd/man/systemd.swap.html

Check if you have any swap units using systemctl list-units --type=swap --state=active

You can permanently disable any active swap unit with systemctl mask <unit name>.

Note: Do not use systemctl disable <unit name> to disable the swap unit as swap unit will be activated again when systemd reloads. Use systemctl mask <unit name> only.

To make sure swap doesn't get re-enabled when your system reboots due to power cycle or any other reason, remove or comment out the swap entries in /etc/fstab

Summarizing:

Run sudo swapoff -a
Check if you have swap units with command systemctl list-units --type=swap --state=active. If there are any active swap units, mask them using systemctl mask <unit name>
Remove swap entries in /etc/fstab

edited Jan 06 '20 at 09:17

answered Jan 03 '20 at 10:51

Shashank V

10,007
2
25
41

I already disabled when I am preparing node by following kubernetes pre-requistic documentation by applying the command `sudo swapoff -a`. – Mr.DevEng Jan 03 '20 at 10:55
Seems like you are using systemd. You probably have swap units. Check if you have swap units using `systemctl list-units --type=swap --state=active`. You can mask any active swap unit using `systemctl mask` – Shashank V Jan 03 '20 at 10:57
When I am applying the command for checking the active swap units, I can see one swap unit is active. And after that again I need to disable the swap and need to restart both the daemon and kubelet. Then only node showing ready state. The same is repeating every time. This is continuously happening. Is there any permanent solution for this ? I updated in my question. Can you check my updates please? – Mr.DevEng Jan 06 '20 at 07:41
How are you disabling the swap? You should do it with `systemctl mask ` command. – Shashank V Jan 06 '20 at 07:51
I am disabling by using `sudo swapoff -a` command. And after that I am applying `sudo systemctl daemon-reload` command and `sudo systemctl restart kubelet` command. Then only node becomes with ready status. After some deployments again going to not ready state and I need to repeat the same steps again. – Mr.DevEng Jan 06 '20 at 09:07
As I have clearly mentioned in my answer, swap will get re-enabled if systemd dameon reloads if you only turn it off with `sudo swapoff -a`. You need to also mask the unit with `systemctl mask ` – Shashank V Jan 06 '20 at 09:09
I did not understood difference between systemctl mask and swapoff ? Kubernetes official documentation describes to apply swapoff command in prerequistics. – Mr.DevEng Jan 06 '20 at 09:11
Ok. I understood. Let me try in that way. Thank you for your response. – Mr.DevEng Jan 06 '20 at 09:13
@Jacob Sure. I have improved the answer for better clarity. – Shashank V Jan 06 '20 at 09:18
This problem is solved with @ShashankV steps? Did you had the opportunity to check it? – Mark Watney Jan 07 '20 at 09:22

score 1 · Answer 2 · answered Jan 11 '20 at 05:49

The root cause is the swap space. To disable completely follow steps:

run swapoff -a: this will immediately disable swap but will activate on restart
remove any swap entry from /etc/fstab reboot the system.

If the swap is gone, good. If, for some reason, it is still here, you had to remove the swap partition. Repeat steps 1 and 2 and, after that, use fdisk or parted to remove the (now unused) swap partition. Use great care here: removing the wrong partition will have disastrous effects!

reboot

This should resolve your issue.

score 0 · Answer 3 · edited Sep 07 '22 at 12:03

0

Removing /etc/fstab will give the vm error, I think we should find another way to solve this issue. I tried to remove the fstab, all command (install, ping and other command) error.

edited Sep 07 '22 at 12:03

sidharth vijayakumar

1,190
5
29

answered Aug 27 '22 at 14:53

Rizki Sadewa

1
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 04 '22 at 08:51

kubelet.service: Unit entered failed state in not ready state node error from kubernetes cluster

3 Answers3