0

I've deployed a Gitlab 12.10 instance on microk8s 1.18 on my Ubuntu 19.10 server. I've noticed repeatedly that some of the pods go into status CrashBackoffLoop or Init:CrashBackoffLoop with error messages like the following:

Message: failed to create containerd task: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"rootfs_linux.go:58: mounting \\\"/var/snap/microk8s/common/var/lib/kubelet/pods/de572f6b-f738-4b1a-ae02-80fef062cabe/volume-subpaths/sidekiq-secrets/dependencies/2\\\" to rootfs \\\"/var/snap/microk8s/common/run/containerd/io.containerd.runtime.v1.linux/k8s.io/3ae6fc33b02192c3940af4ebc991a47b1fd0afc2533901af50ae5a5f93585d1c/rootfs\\\" at \\\"/var/snap/microk8s/common/run/containerd/io.containerd.runtime.v1.linux/k8s.io/3ae6fc33b02192c3940af4ebc991a47b1fd0afc2533901af50ae5a5f93585d1c/rootfs/srv/gitlab/config/secrets.yml\\\" caused \\\"no such file or directory\\\"\"": unknown

Unfortunately, I'm not sure if this problem is specific to Gitlab or a general issue with my microk8s cluster. It's not always the same component that crashes, the ones that seem to fail most frequently are gitlab-runner, sidekiq and unicorn, but I've also seen others with a similar "no such file or directory" error from time to time.

When I delete a crashed pod, the new one that gets created starts without problems, then everything seems to work for a while until one or more of the pods crash again.

Any idea what might be causing this?

Jens
  • 1
  • 1
  • Hi Jens, welcome to S.F. You deployed GitLab 12.10 _how_? Via their supported Helm chart? And are you sure you have enough disk space to run all of GitLab on one Node? – mdaniel May 17 '20 at 03:44
  • Yes, I used their supported Helm chart and they also list some settings for a small deployment. But that only mentions CPUs and RAM, nothing about required disk space. The defaults for the Persistent Volume Claims are around 80GB, which is no problem, but I don't know how much storage is required apart from the PVCs. – Jens May 17 '20 at 05:01
  • You said it works when it gets recreated until another pod fails. Which one is crushing and why, can you post logs from those pods? – Crou May 20 '20 at 11:40

1 Answers1

0

I had the same problem; initially, a full server reboot fixed it, but it then returned intermittently.

Finding

Finally, it turned to be a permission issue with the "raw" secrets files (like

/var/snap/microk8s/common/var/lib/kubelet/pods/<pod_guid>/volume-subpaths/task-runner-secrets/task-runner/<file_no>
) for the 3 main pods (sidekick, unicorn & task-runner) - all other secret and config map file entries had
r--r-----
permissions, with
root:<user group>
ownership - but these secret files also had
r--r-----
, but for
<user>:<user group>
, which means "not readable for anyone but <user>" - which sometimes works (if microk8s gets started in the user context) but sometimes fails (no idea about the logic of micro8ks here)

Solution

chown root:<user group>
for these files fixes the issue
Philipp
  • 1
  • 2