0

I need to copy data across Google Cloud Platform - Cloud Storage(GCS) buckets (source is a GCS bucket and destination is a GCS bucket) Since I perform copy along some more operations in small batches I use the gsutil cp command from bash shell script

The exact command I use is as follows

# objpaths_file has object paths as gs://source_bucket/obj1, ...
objlist=objpaths_file
cat $objlist| gsutil -m cp -I gs://target_bucket

The objects to be copied have custom metadata fields. This way of copying objects using "gsutil cp" does copy custom metadata key values, provided the metadata key has an associated non null value In case a custom metadata key has null value then the copied metadata does not have that key in the destination (the key with null value is dropped from copy)

So my questions are

  • Is there any other mechanism that will allow me to programatically copy the objects with custom metadata with all keys (regardless of the key value being NULL) ?
  • Is there an option to change this behaviour of the gsutil cp command ?
  • Alternatively I am also open to suggestions for recreating missing metadata keys with and filling those with null values programatically in destination bucket. Offcourse this option should only add missing fields with key and null values but leave key-value pairs with valid values intact !!

And another less relevant question :-)

  • Would this gsutil behaviour (skipping custom metadata key if value is NULL) be expected behaviour , or would it rather amount to an unexpected behaviour/ defect? should I approach google support seeking a fix in that case ?)

Thanks for your response

Yogesh

Yogesh Devi
  • 617
  • 11
  • 30
  • 1
    I think it is important to define which metadata keys with a null value are dropped. Metadata is Key/Value, not key with no value. Google defines Cloud Storage Object Metadata as `Metadata exists as key:value pairs.` [reference](https://cloud.google.com/storage/docs/metadata#introduction). Therefore, if you have null values, you are not conforming. The issue is your problem, not Google's. Your answer recommends switching to `gcloud storage`. I would not expect non-conforming behavior to be long-lived. – John Hanley Jun 28 '23 at 06:22
  • @JohnHanley Thanks a lot for comment . I agree with you in Pedantic sense in a perfect world . However google allows storing Metadata as Key with NULL values . We have existing data and cloud native code that counts on this bhaviour - so now taking a high ground of "conformance" does not help. Google allowed a non conforming behaviour in first place . And it is not possible to go back to fix "non conforming" data and code, so we need a solution we can work with - so I am still glad that "gcloud" which is an API newer that gsutil has a more expected behaviour .. and will hope it stays – Yogesh Devi Jun 28 '23 at 09:10
  • I am not trying to take the high ground. Can you share a technical reason why you need Key/NullValue metadata in Cloud Storage? Since metadata often becomes HTTP headers and that is the reason I asked which metadata keys. HTTP does not support NULL K/V pairs and is actually banned by the standard. Not recommending a strategy that can break systems or violate standards is not taking the high ground. You have a short-term solution using `gcloud storage` but how long will that last? Until you offer a technical reason why null values are required, my suggestion is to fix your implementation. – John Hanley Jun 28 '23 at 17:48
  • Hi @JohnHanley we are writing utilities that work with existing legacy data that has key/value metadata with NULLs for values . I am not privy to reason why this data was stored that way ( with NULL keys) in first place . We now need to move around that data preserving its structure ( not losing metadata keys) . There is cloud native code legacy code that needs all metadata keys to be present . So presently we do not have a choice per se – Yogesh Devi Jun 29 '23 at 13:15
  • Are you a third-party software vendor writing tools for other customers? If yes, my advice will be different. – John Hanley Jun 29 '23 at 18:08

1 Answers1

0

since gsutil cp command loses metadata

use the gcloud storage cp command instead.

Below command will copy all metadata fields including keys with NULL values

** gcloud storage cp gs://<source bucket>/objectname gs://<target bucket>/ **

and gcloud storage cp allows multiple arguments to behave like the gsutil batch mode so I could use ** gcloud storage cp gs://<source bucket>/object1 gs://<source bucket>/object2 ...<I tried 1000 objects> gs://<target bucket>/ **

Yogesh Devi
  • 617
  • 11
  • 30
  • 1
    I do not think this is an answer. The issue is that your application does not conform to Google Cloud Storage (see my comment to your question). IMHO this solution is a side-effect and does not conform to Google's documentation. Since this issue is now public, Google will either fix the command to conform (remove null value support) or change the documentation to support null key values. However, until that time, recommending a side-effect is not a good idea. – John Hanley Jun 28 '23 at 06:30
  • Please see my comment above – Yogesh Devi Jun 28 '23 at 09:12
  • Let's assume I am right just for a moment. Imagine this scenario. Google holds a meeting to review this technical item. Google decides that this feature is non-conforming because it violates the HTTP standard forbidding null values. The CLI `gcloud storage` is patched to prevent further usage. You receive an email notifying you of the problem with 180 days to correct it. Future readers of your answer assume they can also use null values making this answer wrong. – John Hanley Jun 28 '23 at 18:14
  • I get it from a long term perspective. Tactically I am just going to pray that the gcloud API continues to behave ias t does now till I finish this job on our hands :-) – Yogesh Devi Jun 29 '23 at 13:16