I am using gcloud storage cp command to copy large number of files from one gcp bucket to another bucket using below command
gcloud storage cp -r "gs://test-1/*" "gs://test-3" --encryption-key=XXXXXXXXXXXXXXXXXXXXXXX --storage-class=REGIONAL
I have a use case where I want to copy files but skip files which are already copied.
--manifest-path
can solve this problem for me using below command.
gcloud storage cp -r "gs://test-1/*" "gs://test-3" --encryption-key=XXXXXXXXXXXXXXXXXXXXXXX --manifest-path=manifest.csv --storage-class=REGIONAL
However I will be running this command on k8s so pod storage will be ephemeral and this file will be lost so I want to keep it hosted somewhere.
I tried passing google cloud storage location for manifest file but it gave me errors.
gcloud storage cp -r "gs://test-1/*" "gs://test-3" --encryption-key=XXXXXXXXXXXXXXXXXXXXXXX --manifest-path=gs://manifests-bucket/manifest.csv --storage-class=REGIONAL
ERROR: (gcloud.storage.cp) Unable to write file [gs://manifests-bucket/manifest.csv]: [Errno 2] No such file or directory: 'gs://manifests-bucket/manifest.csv'
How can I pass manifest file path as google cloud storage bucket file path ?
References : https://cloud.google.com/sdk/gcloud/reference/storage/cp#--manifest-path
EDIT 1 :
Tried giving permissions to bucket assuming gcloud storage cp uses storage-transfer-service
service account behind the scenes.
gsutil iam ch serviceAccount:project-XXXXXXXXX@storage-transfer-service.iam.gserviceaccount.com:objectCreator,legacyBucketReader gs://manifests-bucket/
References :
https://cloud.google.com/storage-transfer/docs/manifest https://cloud.google.com/storage-transfer/docs/source-cloud-storage#grant_the_required_permissions
EDIT 2
Tried gsutil rsync
command by passing encryption key, it doesn't do anything. output of command is attached below as well.
➜ gsutil -m -o "GSUtil:encryption_key=XXXXXXXXXXXXXXXXX" rsync gs://test-1 gs://test-3
WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".
Building synchronization state...
If you experience problems with multiprocessing on MacOS, they might be related to https://bugs.python.org/issue33725. You can disable multiprocessing by editing your .boto config or by adding the following flag to your command: `-o "GSUtil:parallel_process_count=1"`. Note that multithreading is still available even if you disable multiprocessing.
Starting synchronization...
If you experience problems with multiprocessing on MacOS, they might be related to https://bugs.python.org/issue33725. You can disable multiprocessing by editing your .boto config or by adding the following flag to your command: `-o "GSUtil:parallel_process_count=1"`. Note that multithreading is still available even if you disable multiprocessing.