1

I have a simple k8s installation with few nodes and ceph (kubernetes.io/rbd) as storageclass. I have a deployment with a single pod which uses a persistent volume from the persistent volume claim (ReadWriteOnce) from this storage class.

A node with this pod have failed (NotReady in get nodes output for a long time and it's physically dead).

K8s could not create a new pod for my deploy because of 'Multi-Attach error for volume "pvc-..." Volume is already exclusively attached to one node and can't be attached to another'.

I see that pv is bounded to the failed node: "Status: Bound".

How can I force kubernetes to forget about old pod to allow a new pod to bound to the persistent volume?

George Shuklin
  • 6,952
  • 10
  • 39
  • 80

1 Answers1

1

It is a complex problem.

Kubelet daemon, which manages mounts of Volumes, should set the information about a new status of volume to enable the Scheduler to spawn a Pod on the other node.

But, you have the 'NotReady' status, which means Kubernetes cannot communicate with the Kubelet to check the current status of Volumes. In Kubernetes, the status of the Volume is the last one which has been received - "Bound." It is not possible to reset that status somehow without changing the state of the node.

I see only 2 workarounds here:

  1. Use PVC in ReadWriteManymode instead of ReadWriteOnce. CephFS can work in that mode, but RBD can't. That mode allows Kubernetes to claim the same volume on several nodes at the same time.
  2. Delete failed node from the cluster. It will remove all objects linked to the node and Scheduler will be able to claim your Volume again.
Anton Kostenko
  • 8,200
  • 2
  • 30
  • 37
  • I was able to remove old PV, and respawn a new pod for the same deployment. Now I have PVC in the state 'Lost', but data in the mount are intact in a new container. I feel like I found some nasty bug.. – George Shuklin Apr 20 '18 at 14:25
  • Yep. That's why I wrote that I see only 2 ways which will not brake anything:) – Anton Kostenko Apr 20 '18 at 14:42
  • Ceph RBD volume cannot work in RWX mode. Ceph FS can, but it's something different. Sometimes it's undesirable to allow multiple writing pods, especially in case of databases. – Adam Jul 24 '18 at 22:59
  • Yes, you are right, RBD do not support RWM. I edited my answer. – Anton Kostenko Jul 25 '18 at 07:43