0

I have a bucket folder in Google Cloud with about 47GB of data in it. I start a new Kubernetes StatefulSet (in my Google Cloud Kubernetes cluster). The first thing that the container inside the StatefulSet does is to use gsutil -m rsync -r gs://<BUCKET_PATH> <LOCAL_MOUNT_PATH> to sync the bucket folder contents to a locally mounted folder, which corresponds to a Kubernetes Persistent Volume. The Persistent Volume Claim for this StatefulSet requests 125Gi of storage and is only used for this rsync. But the gsutil sync eventually hits a wall where the pod runs out of disk space (space in the Persistent Volume) and gsutil throws an error: [Errno 28] No space left on device. This is weird, because I only need to copy 47GB of data over from the bucket, but the Persistent Volume should have 125Gi of storage available.

I can confirm the Persistent Volume Claim and the Persistent Volume have been provisioned with the appropriate sizes by using kubectl get pvc and kubectl get pv. If I run df -h inside the pod (kubectl exec -it <POD_NAME> -- df -h) I can see that the mounted path exists and that it has the expected size (125Gi). Using df -h during the sync I can see that it does indeed take up all the available space in the Persistent Volume when it finally hits No space left on device.

Further, if I provision a Persistent Volume of 200Gi and retry the sync, it finishes successfully and df -h shows that the used space in the Persistent Volume is 47GB, as expected (this is after gsutil rsync is completed).

So it seems that gsutil rsync uses far more space while syncing than I would expect. Why is this? Is there a way to change how gsutil rsync is done so that it doesn't require a larger Persistent Volume than necessary?

It should be noted that there are a lot of individual files, and that the pod is restarted about 8 times during the sync.

Spencer
  • 1,915
  • 1
  • 10
  • 11
  • Can you share the stateful set file which is using the rsync ?? AS I need to take reference – gaurav agnihotri Mar 14 '22 at 05:59
  • Hi @gauravagnihotri. I can't share it exactly as it is because it's an internal company deployment doc, but I can share an anonymized version, which should contain the info you are looking for: https://pastebin.com/CW79aXsR – Spencer Mar 19 '22 at 22:27

1 Answers1

1

rsync will transfer contents to a temporary file in the target folder first. If it succeeds then it will rename the file to become the target file. If the transfer fails, the temporary file will be deleted. You could try adding --inplace flag to the command, according to the link: “This option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.”

Yanan C
  • 191
  • 6