I have a Kubernetes cluster setup (on-premise), that has an NFS share (my-nfs.internal.tld) mounted to /exports/backup
on each node to create backups there.
Now I'm setting up my logging stack and I wanted to make the data persistent. So I figured I could start by storing the indices on the NFS.
Now I found three different ways to achieve this:
NFS-PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: logging-data
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
nfs:
server: my-nfs.internal.tld
path: /path/to/exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logging-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: logging-data
resources:
requests:
storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
persistentVolumeClaim:
claimName: logging-data-pvc
This would, of course, require, that my cluster gets access to the NFS (instead of only the nodes as it is currently setup).
hostPath-PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: logging-data
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logging-data-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: logging-data
resources:
requests:
storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
persistentVolumeClaim:
claimName: logging-data-pvc
hostPath mount in deployment
As the nfs is mounted to all my nodes, I could also just use the host path directly in the deployment without pinning anything.
apiVersion: apps/v1
kind: Deployment
...
spec:
...
template:
...
spec:
...
volumes:
- name: logging-data-volume
hostPath:
path: /exports/backup/logging-data
type: DirectoryOrCreate
So my question is: Is there really any difference between these three? I'm pretty sure all three work. I tested the second and third already. I was not yet able to test the first though (in this specific setup at least). Especially the second and third solutions seem very similar to me. The second makes it easier to re-use deployment files on multiple clusters, I think, as you can use persistent volumes of different types without changing the volumes
part of the deployment. But is there any difference beyond that? Performance maybe? Or is one of them deprecated and will be removed soon?
I found a tutorial mentioning, that the hostPath-PV only works on single-node clusters. But I'm sure it does also works in my case here. Maybe the comment was about: "On multi-node clusters the data changes when deployed to different nodes."
From reading to a lot of documentation and How-To's I understand, that the first one is the preferred solution. I would probably also go for it as it is the one easiest replicated to a cloud setup. But I do not really understand why this is preferred to the other two.
Thanks in advance for your input on the matter!