5

I have a Kubernetes cluster setup (on-premise), that has an NFS share (my-nfs.internal.tld) mounted to /exports/backup on each node to create backups there.

Now I'm setting up my logging stack and I wanted to make the data persistent. So I figured I could start by storing the indices on the NFS.

Now I found three different ways to achieve this:

NFS-PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: logging-data
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  nfs:
    server: my-nfs.internal.tld
    path: /path/to/exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: logging-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: logging-data
  resources:
    requests:
      storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          persistentVolumeClaim:
            claimName: logging-data-pvc

This would, of course, require, that my cluster gets access to the NFS (instead of only the nodes as it is currently setup).

hostPath-PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: logging-data
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /exports/backup/logging-data/
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: logging-data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: logging-data
  resources:
    requests:
      storage: 10Gi
apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          persistentVolumeClaim:
            claimName: logging-data-pvc

hostPath mount in deployment

As the nfs is mounted to all my nodes, I could also just use the host path directly in the deployment without pinning anything.

apiVersion: apps/v1
kind: Deployment
...
spec:
  ...
  template:
    ...
    spec:
      ...
      volumes:
        - name: logging-data-volume
          hostPath:
            path: /exports/backup/logging-data
            type: DirectoryOrCreate

So my question is: Is there really any difference between these three? I'm pretty sure all three work. I tested the second and third already. I was not yet able to test the first though (in this specific setup at least). Especially the second and third solutions seem very similar to me. The second makes it easier to re-use deployment files on multiple clusters, I think, as you can use persistent volumes of different types without changing the volumes part of the deployment. But is there any difference beyond that? Performance maybe? Or is one of them deprecated and will be removed soon?

I found a tutorial mentioning, that the hostPath-PV only works on single-node clusters. But I'm sure it does also works in my case here. Maybe the comment was about: "On multi-node clusters the data changes when deployed to different nodes."

From reading to a lot of documentation and How-To's I understand, that the first one is the preferred solution. I would probably also go for it as it is the one easiest replicated to a cloud setup. But I do not really understand why this is preferred to the other two.

Thanks in advance for your input on the matter!

Max N.
  • 993
  • 1
  • 12
  • 31

1 Answers1

1

The NFS is indeed the preferred solution:

An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This means that an NFS volume can be pre-populated with data, and that data can be shared between pods. NFS can be mounted by multiple writers simultaneously.

So, an NFS is useful for two reasons:

  • Data is persistent.

  • It can be accessed from multiple pods at the same time and the data can be shared between pods.

See the NFS example for more details.

While the hostPath:

A hostPath volume mounts a file or directory from the host node's filesystem into your Pod.

Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes

The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume

hostPath is not recommended due to several reasons:

  • You don't directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.

  • You expose your cluster to security threats.

  • If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.

the hostPath would be good if for example you would like to use it for log collector running in a DaemonSet. Other than that, it would be better to use the NFS.

Wytrzymały Wiktor
  • 11,492
  • 5
  • 29
  • 37
  • Ok, but as you see from my post, I'm using an NFS in all three scenarios. Only in two cases I mount the path where the host has mounted the NFS share. So your two points (i) data is persistent and (ii) can be accessed form multiple pods (also (iii) can be prepopulated) is true for all three. Also asince the NFS is mounted on all worker nodes I circumvent the problem that "you're not guaranteed that the pod will actually be scheduled on the node that has the data volume". – Max N. Jan 05 '21 at 11:13
  • Yes, but using both NFS and `hostPath` is not necessary and could be wasteful resource-wise. Also, it would still be prone to security threats. The more recommended way would be to simply use a [dynamic volume provisioner](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#dynamic) that would create the matching PersistentVolume for you. You mentioned Cloud, and for example GCP has [Persistent volumes and dynamic provisioning](https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes). – Wytrzymały Wiktor Jan 05 '21 at 13:18
  • Hm. I know cloud providers have their ways of providing persistent volumes. I used quiet a lot of them already. To the NFS vs NFS-on-hostPath: That's exactly what I mean by "I cannot figure out why". I just always read the statement, that it is the better solution. I just don't understand why :( Why this might be wasteful resource-wise, for example. And why my cluster would be prone to security threats. Could you elaborate on those? Or provide some further reading material? – Max N. Jan 05 '21 at 14:03
  • For example, if your pod is hacked and the hacker gets access to your host through the `hostPath` by writing to it then your entire cluster is actually hacked. Also, you can use NFS-on-hostPath, sure but why? Using the NFS share directly instead of using it as `hostPath` makes it so you don't have to manage mounting NFS on hosts yourself or managing the local volumes. It's just easier in general and thus you will find many opinions pointing into that solution. – Wytrzymały Wiktor Jan 05 '21 at 14:25
  • If I mount `/export/shared` from the NFS to `/mnt/my-nfs/shared` and mount `/mnt/my-nfs/shared` somewhere in my container, why could an attacker than take over my whole cluster? I would guess he would be able to access the exact same data he could access when I mount `/export/shared` from the NFS directly to the container. Of course if I mount `/` from the host (dunno if this is even possible), he would have access to the full host. But that's not what I am doing :) – Max N. Jan 05 '21 at 15:10
  • 1
    An example as to why I would go for the NFS-on-host would be, that I'm not in charge of the NFS. And it is only accessible from the hosts, but not from K8s. Another might be, that maybe my hosts NFS implementation is better/faster/more performant than the K8s one. Not saying it is, but who knows. I haven't tried :) – Max N. Jan 05 '21 at 15:13
  • But OK. As stated in the original post: I would also go for the direct NFS mount. Just because it _feels right_. I was looking for some backup for that feeling :) This is more of a theoretical question. And I'm mainly interested in the difference between the second and third scenario. Is there any real difference between those two? Or are they just interchangeable without any benefits or downsides? – Max N. Jan 05 '21 at 15:14
  • In your particular use case it shouldn't make too much difference. You can mount the volume directly or use Clams as a volume. It's up to you. – Wytrzymały Wiktor Jan 05 '21 at 15:44