Kubernetes volumes: when is Statefulset necessary?

Question

I'm approaching k8s volumes and best practices and I've noticed that when reading documentation it seems that you always need to use StatefulSet resource if you want to implement persistency in your cluster:

"StatefulSet is the workload API object used to manage stateful applications."

I've implemented some tutorials, some of them use StatefulSet, some others don't.

In fact, let's say I want to persist some data, I can have my stateless Pods (even MySql server pods!) in which I use a PersistentVolumeClaim which persists the state. If I stop and rerun the cluster, I can resume the state from the Volume with no need of StatefulSet.

I attach here an example of Github repo in which there is a stateful app with MySql and no StatefulSet at all: https://github.com/shri-kanth/kuberenetes-demo-manifests

So do I really need to use a StatefulSet resource for databases in k8s? Or are there some specific cases it could be a necessary practice?

score 2 · Answer 1 · answered Jul 13 '20 at 09:23

PVCs are not the only reason to use Statefulsets over Deployments. As the Kubernetes manual states:

StatefulSets are valuable for applications that require one or more of the following:

Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.

You can read more about database considerations when deployed on Kubernetes here To run or not to run a database on Kubernetes

score 2 · Answer 2 · answered Jul 13 '20 at 21:31

StatefulSet is not the same as PV+PVC.

A StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

In other words it manages the deployment and scaling of a set of Pods , and provides guarantees about the ordering and uniqueness of these Pods.

So do I really need to use a StatefulSet resource for databases in k8s?

It depends on what you would like to achieve.

StatefulSet gives you:

Possibility to have a Stable Network ID (so your pods will be always named as $(statefulset name)-$(ordinal) )
Possibility to have a Stable Storage, so when a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims.

...MySql and no StatefulSet...

As you can see, if your goal is just to run single RDBMS Pod (for example Mysql) that stores all its data (DB itself) on PV+PVC, then the StatefulSet is definitely an overkill.

However, if you need to run Redis cluster (distributed DB) :-D it'll be close to impossible to do that without a StatefulSet (to the best of my knowledge and based on numerous threads about the same on StackOverflow).

I hope that this info helps you.

even for the redis cluster why a statefulset would be necessary? I have master election (if a node goes down) and load balancing among slaves nodes for reading. It seems a stable memory/network ID isn't necessary this way... — Alessandro Argentieri, Jul 14 '20 at 06:55
Sorry, I made a typo. I was thinking about the Apache Zookeeper. — Nick, Jul 14 '20 at 07:36

Kubernetes volumes: when is Statefulset necessary?

2 Answers2