Many of my workflows use pod iam roles. As documented here, I must include fsGroup in order for non-root containers to read the generated identity token. The problem with this is when I additionally include pvc’s that point to cifs pv’s, the volumes fail to mount because they time out. Seemingly this is because Kubelet tries to chown all of the files on the volume, which takes too much time and causes the timeout. Questions…
- Why doesnt Kubernetes try to chown all of the files when hostPath is used instead of a pvc? All of the workflows were fine until I made the switch to use pvcs from hostPath, and now the timeout issue happens.
- Why does this problem happen on cifs pvcs but not nfs pvcs? I have noticed that nfs pvcs continue to mount just fine and the fsGroup seemingly doesn’t take effect as I don’t see the group id change on any of the files. However, the cifs pvcs can no longer be mounted seemingly due to the timeout issue. If it matters, I am using the native nfs pv lego and this cifs flexVolume plugin that has worked great up until now.
Overall, the goal of this post is to better understand how Kubernetes determines when to chown all of the files on a volume when fsGroup is included in order to make a good design decision going forward. Thanks for any help you can provide!
Kubernetes Chowning Files References
https://learn.microsoft.com/en-us/azure/aks/troubleshooting
Since gid and uid are mounted as root or 0 by default. If gid or uid are set as non-root, for example 1000, Kubernetes will use chown to change all directories and files under that disk. This operation can be time consuming and may make mounting the disk very slow.
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup.