0

k8s newbie here.

StatefulSets allow creating pods with a) predefined names and b) an order. In my case, I don't need an order (b), and this is giving me trouble. (a) is useful in my case because I need to keep the state if a container dies.

Example, I have [ pod-0, pod-1, pod-2 ], and just want pod-0 to die, but this is what happens:

This is expected:

  1. [ pod-0:Running, pod-1:Running, pod-2:Running ]
  2. My app needs to scale to 2 replicas by killing pod-0, so "k delete pod/pod-0" and "Replicas: 2"
  3. [ pod-0:terminating, pod-1:Running, pod-2:Running ]

I want to keep this state!

4. [ pod-1:Running, pod-2:Running ]

This, I don't want!!!, but can't prevent K8s from doing:

5. [ pod-0:Starting, pod-1:Running, pod-2:Running ] (K8s shifts the pipe!!!)
6. [ pod-0:Running, pod-1:Running, pod-2:Terminating ] (K8s shifts the pipe!!!)
7. [ pod-0:Running, pod-1:Running ] (K8s shifts the pipe!!!)

How can I achieve the desired behavior with K8s (keep a set of non-sequential named pods)?

I've seen a promising "AdvancedStatefulSet"(1) by Openkruise.ui, which would allow this, but the product is not yet mature for production. At least, it does not work on minikube (minikube 1.16.0, docker 19.03.13, OpenKruise 0.7.0).


Someone asked for my deployment file, here it goes:

kind: StatefulSet
apiVersion: apps/v1

metadata:
  name: contextcf
  labels:
    name: contextcf
spec:
  serviceName: contextcf
  selector:
    matchLabels:
      name: contextcf
  replicas: 3
  template:
    metadata:
      labels:
        name: contextcf
    spec:
      containers:
        - name: contextcf
          image: (my-registry)/contextcf:1.0.0
          ports:
            - name: web
              containerPort: 80
# Volume sections removed, no issues there. The application is a simple as this.
RodolfoAP
  • 743
  • 1
  • 7
  • 18
  • Have you consider scaling down your StatefulSet https://github.com/luksa/statefulset-scaledown-controller https://stackoverflow.com/questions/62066640/stop-all-pods-in-a-statefulset-before-scaling-it-up-or-down https://medium.com/@marko.luksa/graceful-scaledown-of-stateful-apps-in-kubernetes-2205fc556ba9 ? – Malgorzata Jan 21 '21 at 09:59
  • @Malgorzata You are suggesting to run through the steps 5,6,7, and then manage the data. a) No problem with the data, my application manages it well; b) The objective does not address data management, but avoiding the steps 5,6,7; c) The link points to an abandoned application; OpenKruise seems ideal, but it is yet buggy.. – RodolfoAP Jan 21 '21 at 10:11
  • For a StatefulSet with N replicas, each Pod in the StatefulSet will be assigned an integer ordinal, from 0 up through N-1, that is unique over the Set. API will not accept this to leave index from 1-2 if pod-0 is deleted, it will be counted from beginning in proper order. – Malgorzata Jan 29 '21 at 09:04

1 Answers1

0

Can you attach your YAML files?

I have [ pod-0, pod-1, pod-2 ], and just want pod-0 to die, but this is what happen

I can't reproduce this with the simplest StatefulSet

$ cat sts.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web

Newly created pods controlled by the StatefulSet

$ k -n test2 get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
web-0   1/1     Running   0          19s   10.8.252.144   k8s-vm04   <none>           <none>
web-1   1/1     Running   0          12s   10.8.252.76    k8s-vm03   <none>           <none>
web-2   1/1     Running   0          6s    10.8.253.8     k8s-vm02   <none>           <none>

Try to delete web-0

$ k -n test2 delete pod web-0
pod "web-0" deleted

web-0 is in Terminating status

$ k -n test2 get pods -o wide
NAME    READY   STATUS        RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
web-0   0/1     Terminating   0          47s   10.8.252.144   k8s-vm04   <none>           <none>
web-1   1/1     Running       0          40s   10.8.252.76    k8s-vm03   <none>           <none>
web-2   1/1     Running       0          34s   10.8.253.8     k8s-vm02   <none>           <none>

web-0 is in Creating status

$ k -n test2 get pods -o wide
NAME    READY   STATUS              RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
web-0   0/1     ContainerCreating   0          1s    <none>        k8s-vm04   <none>           <none>
web-1   1/1     Running             0          45s   10.8.252.76   k8s-vm03   <none>           <none>
web-2   1/1     Running             0          39s   10.8.253.8    k8s-vm02   <none>           <none>

All the pods are Running

NAME    READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
web-0   1/1     Running   0          1m21s   10.8.252.145   k8s-vm04   <none>           <none>
web-1   1/1     Running   0          2m59s   10.8.252.76    k8s-vm03   <none>           <none>
web-2   1/1     Running   0          2m5s    10.8.253.8     k8s-vm02   <none>           <none>

Other pods are still running and were not in Terminating state

If you talk about scaling the StatefulSet this statefulset.spec.podManagementPolicy might be helpful for you

$ k explain statefulset.spec.podManagementPolicy
KIND:     StatefulSet
VERSION:  apps/v1

FIELD:    podManagementPolicy <string>

DESCRIPTION:
     podManagementPolicy controls how pods are created during initial scale up,
     when replacing pods on nodes, or when scaling down. The default policy is
     `OrderedReady`, where pods are created in increasing order (pod-0, then
     pod-1, etc) and the controller will wait until each pod is ready before
     continuing. When scaling down, the pods are removed in the opposite order.
     The alternative policy is `Parallel` which will create pods in parallel to
     match the desired scale without waiting, and on scale down will delete all
     pods at once.
Konstantin Vustin
  • 6,521
  • 2
  • 16
  • 32
  • 1
    Ok, what do you need the StatefulSet for? Your desired order of scaling is too custom, it's really too hard to imagine any situation when it might be useful, I mean what is the case where you need scale down from the beginning of the order? – Konstantin Vustin Jan 21 '21 at 11:13
  • 1
    you say you only need the state, not order. So the order does not matter for you, but at the same time you need custom order. Those 2 statements contradict each other – Konstantin Vustin Jan 21 '21 at 11:33
  • That discussion about Deployments not StatefulSets – Konstantin Vustin Jan 21 '21 at 12:46
  • This answer is not useful (just deleted some comments). The requirement is to have containers with state, not order. In other words, to downscale by targeting specific containers to die, and avoid reordering (see 5,6,7). Thanks, anyway. – RodolfoAP Jan 21 '21 at 13:56
  • `podManagementPolicy` to `Parallel` in `statefulset.spec` will give you the state without an order. Good luck – Konstantin Vustin Jan 21 '21 at 14:54
  • Sadly, not. Although the documentation states so, it just provides parallel start and termination (downscaling 10 to 5 will kill 5 in parallel), but still forces "Ordered Set Termination" (as stated by the doc); that is, it will only allow persisting those numbered 0 to 4; I tested this: I sent the kill signal to 0 to 5 of 10, and rescaled to 5, so **ALL 10** appeared "Terminating" (parallel works ok), then 0 to 5 were recreated. – RodolfoAP Jan 21 '21 at 16:57