6

On an occasional basis I need to perform a rolling replace of all Pods in my StatefulSet such that all PVs are also recreated from scratch. The reason to do so is to get rid of all underlying hard drives that use old versions of encryption key. This operation should not be confused with regular rolling upgrades, for which I still want volumes to survive Pod terminations. The best routine I figured so far to do that is following:

  1. Delete the PV.
  2. Delete the PVC.
  3. Delete the Pod.
  4. Wait until all deletions complete.
  5. Manually recreate the PVC deleted in step 2.
  6. Wait for the new Pod to finish streaming data from other Pods in the StatefulSet.
  7. Repeat from step 1. for the next Pod.

I'm not happy about step 5. I wish StatefulSet recreated the PVC for me, but unfortunately it does not. I have to do it myself, otherwise Pod creation fails with following error:

Warning  FailedScheduling   3s (x15 over 15m)  default-scheduler   persistentvolumeclaim "foo-bar-0" not found

Is there a better way to do that?

lopek
  • 522
  • 5
  • 12
  • what means manually in your case? what stops you from specifying PV/PVC as a yml file and delete/create them from file? – Anton Matsiuk Mar 26 '20 at 12:34
  • Nothing stops me, that's exactly what I did. But I' hoping maybe there is a cleaner way. The PVC template is configured on the StatefulSet. StatefulSet controller created the original PVC. I would like to avoid bypassing StatefulSet and creating the PVC 'from file' (step 5.), because that can be error prone. – lopek Mar 26 '20 at 12:46

4 Answers4

5

I just recently had to do this. The following worked for me:

# Delete the PVC
$ kubectl delete pvc <pvc_name>

# Delete the underlying statefulset WITHOUT deleting the pods
$ kubectl delete statefulset <statefulset_name> --cascade=false 

# Delete the pod with the PVC you don't want
$ kubectl delete pod <pod_name>

# Apply the statefulset manifest to re-create the StatefulSet, 
# which will also recreate the deleted pod with a new PVC
$ kubectl apply -f <statefulset_yaml>

jpdstan
  • 51
  • 1
  • 2
  • With later Kubernetes the argument `--cascade=false` could/should be replaced with `--cascade=orphan` – mloskot Feb 23 '23 at 19:48
3

This is described in https://github.com/kubernetes/kubernetes/issues/89910. The workaround proposed there, of deleting the new Pod which is stuck pending, works and the second time it gets replaced a new PVC is created. It was marked as a duplicate of https://github.com/kubernetes/kubernetes/issues/74374, and reported as potentially fixed in 1.20.

Ben Langfeld
  • 298
  • 1
  • 10
0

I tried your steps above but actually deleting the pvc failed or it got auto-recreated before I noticed. Anyways, maybe on modern kubernetes (I'm using GKE autopilot 1.25.9) the following steps are enough:

  1. Delete the pv
  2. Delete the pod

At least for me this caused the underlying disk to be replaced with no need to recreate pvc manually.

p.s. I needed to do this to recover an elasticsearch node that had a full disk and refused to start (fatal exception while booting Elasticsearch java.io.UncheckedIOException: Failed to load persistent cache. I didn't want to expand the disk size like many guides describe because the problem was data cleanup in the cluster was broken so more space wasn't actually needed.

Kristofer
  • 7,861
  • 6
  • 30
  • 31
-2

It seems like you're using "Persistent" volume in a wrong way. It's designed to keep the data between roll-outs, not to delete it. There are other different ways to renew the keys. One can use k8s Secret and ConfigMap to mount the key into the Pod. Then you just need to recreate a Secret during a rolling update

Anton Matsiuk
  • 662
  • 3
  • 8
  • In 95% of regular maintenance I do want the data to survive restarts/rollouts, that's why I'm using StatefulSet and persistent volumes. What I described in my question is a routine that I will need to run occasionally. The keys I'm talking about are encryption keys stored in Cloud KMS used to encrypt GCP Persistent Disks, so k8s Secrets are not helpful here. – lopek Mar 26 '20 at 13:06