Kubespray disable 'swapoff' command failed with returning 'non-zero return code'

Question

I ran Kubespray in lxc containers with below configuration:(server_ram:8G | all nodes in ubuntu:18.04)

|  NAME   |  STATE  |         IPV4  
+---------+---------+-------------------         
| ansible | RUNNING | 10.21.185.23 (eth0)  
| node1   | RUNNING | 10.21.185.158 (eth0)  
| node2   | RUNNING | 10.21.185.186 (eth0)   
| node3   | RUNNING | 10.21.185.65 (eth0)  
| node4   | RUNNING | 10.21.185.106 (eth0)  
| node5   | RUNNING | 10.21.185.14 (eth0)

In root@ansible: when i ran kubespray command to build cluster i encountered with this Error:


TASK [kubernetes/preinstall : Disable swap] ******************
fatal: [node1]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.020302", "end": "2020-05-13 07:21:24.974910", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:24.954608", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node2]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.010084", "end": "2020-05-13 07:21:25.051443", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.041359", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node3]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.008382", "end": "2020-05-13 07:21:25.126695", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.118313", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
fatal: [node4]: FAILED! => {"changed": true, "cmd": ["/sbin/swapoff", "-a"], "delta": "0:00:00.006829", "end": "2020-05-13 07:21:25.196145", "msg": "non-zero return code", "rc": 255, "start": "2020-05-13 07:21:25.189316", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

lxc containers configuration:(include:node1,node2,node3,node4,node5)

architecture: x86_64
config:
  image.architecture: amd64
  image.description: ubuntu 18.04 LTS amd64 (release) (20200506)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200506"
  image.version: "18.04"
  limits.cpu: "2"
  limits.memory: 2GB
  limits.memory.swap: "false"
  linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
  raw.lxc: "lxc.apparmor.profile=unconfined\nlxc.cap.drop= \nlxc.cgroup.devices.allow=a\nlxc.mount.auto=proc:rw
    sys:rw"
  security.nesting: "true"
  security.privileged: "true"
  volatile.base_image: 93b9eeb85479af2029203b4a56a2f1fdca6a0e1bf23cdc26b567790bf0f3f3bd
  volatile.eth0.hwaddr: 00:16:3e:5a:91:9a
  volatile.idmap.base: "0"
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""

When i try to swapoff manually in nodes i receive nothing.

root@node1:~# /sbin/swapoff -a
root@node1:~#

it will be so helpful if anyone has an idea.

if you run the command `/sbin/swapoff -a` on one of the nodes manually what is the output that you are getting? — Tummala Dhanvi, May 13 '20 at 11:51
Have you set the parameter: `limits.memory.swap: "false"` when creating this nodes? — Dawid Kruk, May 13 '20 at 17:27
yes i set limits.memory.swap: "false" but result is the same. — Sajjad Hadafi, May 14 '20 at 08:22

score 2 · Accepted Answer · answered May 28 '20 at 10:32

I divided this answer on 2 parts:

TL;DR Why Kubespray fails on swapoff -a
How to install Kubernetes with Kubespray on LXC containers

TL;DR

Kubespray fails because he gets non exit zero code (255) when running swapoff -a.

A non-zero exit status indicates failure. This seemingly counter-intuitive scheme is used so there is one well-defined way to indicate success and a variety of ways to indicate various failure modes.

Gnu.org: Exit Status

Even if you set limits.memory.swap: "false" in the profile associated with the containers it will still produce this error.

There is a workaround for it by disabling swap in your host system. You can do it by:

$ swapoff -a
delete line associated with swap in /etc/fstab
$ reboot

After that your container should produce zero exit code when issuing $ swapoff -a

How to install Kubernetes with Kubespray on LXC containers

Assuming that you created your lxc containers and have full ssh access to them, there are still things to take into consideration before running kubespray.

I ran kubespray on lxc containers and stumbled upon issues with:

storage space
docker packages
kmsg
kernel modules
conntrack

Storage space

Please make sure you have enough storage within your storage pool as lack of it will result in failure to provision the cluster. Default storage pool size could be not big enough to hold 5 nodes.

Docker packages

When provisioning the cluster please make sure that you have the newest kubespray version available as the older ones had an issue with docker packages not compatible with each other.

Kmsg

The /dev/kmsg character device node provides userspace access to the kernel's printk buffer.

Kernel.org: Documentation: dev-kmsg

By default kubespray will fail to provision the cluster when the /dev/kmsg is not available on the node (lxc container).

/dev/kmsg is not available on lxc container and this will cause a failure of kubespray provisioning.

There is a workaround for it. In each lxc container run:

# Hack required to provision K8s v1.15+ in LXC containers
mknod /dev/kmsg c 1 11
chmod +x /etc/rc.d/rc.local
echo 'mknod /dev/kmsg c 1 11' >> /etc/rc.d/rc.local

Github.com: Justmeandopensource: lxd-provisioning: bootstrap-kube.sh

I tried other workarounds like:

add lxc.kmsg = 1 to /etc/lxc/default.conf - deprecated
running echo 'L /dev/kmsg - - - - /dev/console' > /etc/tmpfiles.d/kmsg.conf inside the container and then restarting is causing the systemd-journald to sit at 100% usage of a core.

Kernel modules

The LXC/LXD system containers do not load kernel modules for their own use. What you do, is get the host it load the kernel module, and this module could be available in the container.

Linuxcontainers.org: How to add kernel modules to LXC container

Kubespray will check if certain kernel modules are available within your nodes.

You will need to add following modules on your host:

ip_vs
ip_vs_sh
ip_vs_rr
ip_vs_wrr

You can add above modules with $ modprobe MODULE_NAME or follow this link: Cyberciti.biz: Linux how to load a kernel module automatically.

Conntrack

You will need to install conntrack and load a module named nf_conntrack:

$ apt install conntrack -y
modprobe nf_conntrack

Without above commands kubespray will fail on step of checking the availability of conntrack.

With this change in place you should be able to run Kubernetes cluster with kubespray within lxc environment and get output of nodes similar to this:

root@k8s1:~# kubectl get nodes -o wide
NAME   STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
k8s1   Ready    master   14h   v1.18.2   10.224.47.185   <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
k8s2   Ready    master   14h   v1.18.2   10.224.47.98    <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
k8s3   Ready    <none>   14h   v1.18.2   10.224.47.46    <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7
k8s4   Ready    <none>   14h   v1.18.2   10.224.47.246   <none>        Ubuntu 18.04.4 LTS   5.4.0-31-generic   docker://18.9.7