Recurring pod crashes in microk8s gitlab deployment

Question

I've deployed a Gitlab 12.10 instance on microk8s 1.18 on my Ubuntu 19.10 server. I've noticed repeatedly that some of the pods go into status CrashBackoffLoop or Init:CrashBackoffLoop with error messages like the following:

Message: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"rootfs_linux.go:58: mounting \\\"/var/snap/microk8s/common/var/lib/kubelet/pods/de572f6b-f738-4b1a-ae02-80fef062cabe/volume-subpaths/sidekiq-secrets/dependencies/2\\\" to rootfs \\\"/var/snap/microk8s/common/run/containerd/io.containerd.runtime.v1.linux/k8s.io/3ae6fc33b02192c3940af4ebc991a47b1fd0afc2533901af50ae5a5f93585d1c/rootfs\\\" at \\\"/var/snap/microk8s/common/run/containerd/io.containerd.runtime.v1.linux/k8s.io/3ae6fc33b02192c3940af4ebc991a47b1fd0afc2533901af50ae5a5f93585d1c/rootfs/srv/gitlab/config/secrets.yml\\\" caused \\\"no such file or directory\\\"\"": unknown

Unfortunately, I'm not sure if this problem is specific to Gitlab or a general issue with my microk8s cluster. It's not always the same component that crashes, the ones that seem to fail most frequently are gitlab-runner, sidekiq and unicorn, but I've also seen others with a similar "no such file or directory" error from time to time.

When I delete a crashed pod, the new one that gets created starts without problems, then everything seems to work for a while until one or more of the pods crash again.

Any idea what might be causing this?

Hi Jens, welcome to S.F. You deployed GitLab 12.10 _how_? Via their supported Helm chart? And are you sure you have enough disk space to run all of GitLab on one Node? — mdaniel, May 17 '20 at 03:44
Yes, I used their supported Helm chart and they also list some settings for a small deployment. But that only mentions CPUs and RAM, nothing about required disk space. The defaults for the Persistent Volume Claims are around 80GB, which is no problem, but I don't know how much storage is required apart from the PVCs. — Jens, May 17 '20 at 05:01
You said it works when it gets recreated until another pod fails. Which one is crushing and why, can you post logs from those pods? — Crou, May 20 '20 at 11:40

Philipp · Answer 1 · 2020-06-02T05:54:45.427

I had the same problem; initially, a full server reboot fixed it, but it then returned intermittently.

Finding

Finally, it turned to be a permission issue with the "raw" secrets files (like

/var/snap/microk8s/common/var/lib/kubelet/pods/<pod_guid>/volume-subpaths/task-runner-secrets/task-runner/<file_no>

) for the 3 main pods (sidekick, unicorn & task-runner) - all other secret and config map file entries had

r--r-----

permissions, with

root:<user group>

ownership - but these secret files also had

r--r-----

, but for

<user>:<user group>

, which means "not readable for anyone but <user>" - which sometimes works (if microk8s gets started in the user context) but sometimes fails (no idea about the logic of micro8ks here)

Solution

chown root:<user group>

for these files fixes the issue

Recurring pod crashes in microk8s gitlab deployment

1 Answers1

Finding

Solution