zookeeper statefulset require all instance to manually restart after upgrade

Question

I am running 3 zookeeper kubernetes statefulset for my kafka clusters. Named zookeeper-0 , zookeeper-1, zookeeper-2. And I have liveness probe enabled with ruok command. If any statefulset pod get restarted due to any failures the quorum will fail and zookeeper stops working even after the failed pod started up and its liveness probe responds ok.

When this happens I have to manually restart all zookeeper instances to get it back to working. This also happens when I do a helm upgrade. Because when I do a helm upgrade the first instance getting restarted is zookeeper-2 , then zookeeper-1 and finally zookeeper-0. But it seems zookeeper will only work if I start all instances together. So after every helm upgrade I have to restart all instances manually.

My question is:

What could be the reason for this behaviour? Also what is the best way to ensure a 100% reliable statefulset zookeeper in kubernetes environment?

I followed this document https://cinhtau.net/2018/03/08/replicated-zookeeper/. and updated the minimum zookeeper from 3 to 5 instances. This allows 2 instances to fail and still allow zookeeper serving requests. — Matt, Jan 18 '21 at 05:27

Matt · Accepted Answer · 2021-02-24T02:04:52.733

This is a bug in zookeeper v 3.5.8 , https://issues.apache.org/jira/browse/ZOOKEEPER-3829 . If anyone comes across this issue please use the latest zookeeper.

When you do helm upgrade , the instances will be deployed in the reverse order. So if you have zookeeper-0, zookeeper-1 , zookeeper-2 . Then zookeeper-2 will get upgraded first then zookeeper-1 and finally zookeeper-0. This issue is very evident if you have 5 instances.

So to get around this instability issue I added liveness and readiness probe. So zookeeper will get restarted automatically when this issue happens and self heals.

For Kubernetes Statefulset::

       livenessProbe:
              exec:
                command:
                - sh
                - -c
                - /bin/liveness-check.sh 2181
              initialDelaySeconds: 120 #Default 0
              periodSeconds: 60
              timeoutSeconds: 10
              failureThreshold: 2
              successThreshold: 1
        readinessProbe:
              exec:
                command:
                - sh
                - -c
                - /bin/readiness-check.sh 2181
              initialDelaySeconds: 20
              periodSeconds: 30
              timeoutSeconds: 5
              failureThreshold: 2
              successThreshold: 1

Where scripts are defined in Kubernetes ConfigMap

data:
    liveness-check.sh: |
      #!/bin/sh
      zkServer.sh status

    readiness-check.sh: |
      #!/bin/sh
      echo ruok | nc 127.0.0.1 $1

If you don't have zkServer.sh script in the docker image for some reason you can define the config map as following

  readiness-check.sh: |
    #!/bin/sh
    OK=$(echo ruok | nc 127.0.0.1 $1)
    if [ "$OK" == "imok" ]; then
        exit 0
    else
        echo "Liveness Check Failed for ZooKeeper"
        exit 1
    fi
  liveness-check.sh: |
    #!/bin/sh
    echo stat | nc 127.0.0.1 $1 | grep Mode

zookeeper statefulset require all instance to manually restart after upgrade

1 Answers1