0

I am trying to rename a blob (which can be quite large) after having uploaded them to a temporary location in the bucket.

Reading the documentation it says:

Warning: This method will first duplicate the data and then delete the old blob. This means that with very large objects renaming could be a very (temporarily) costly or a very slow operation. If you need more control over the copy and deletion, instead use google.cloud.storage.blob.Blob.copy_to and google.cloud.storage.blob.Blob.delete directly.

But I can find absolutely no reference to copy_to anywhere in the SDK (or elsewhere really).

Is there any way to rename a blob from A to B without the SDK copying the file. In my case overwriting B, but I can remove B first if it's easier.

The reason is checksum validation, I'll upload it under A first to make sure it's successfully uploaded (and doesn't trigger DataCorruption) and only then replace B (the live object)

Niklas B
  • 1,839
  • 18
  • 36
  • If you include a MD5 hash with the request, Google Cloud will verify the upload for you. @David's answer has more details. – John Hanley Nov 16 '22 at 21:53

1 Answers1

1

GCS itself does not support renaming objects. Renaming with a copy+delete is done in the client as a helper, and there is no better way to rename an object at the moment.

As you say your goal is checksum validation, there is a better solution. Upload directly to your destination and use GCS's built in checksum verification. How you do this depends on the API:

  • JSON objects.insert: Set crc32c or md5Hash header.
  • XML PUT object: Set x-goog-hash header.
  • Python SDK Blob.upload_from_* methods: Set checksum="crc32c" or checksum="md5" method parameter.
David
  • 9,288
  • 1
  • 20
  • 52
  • I read about it, but it also says it removes the file if it fails. I read that as if file "A" exists, and we upload a new file "A" and that checksum fails it will remove file "A". Which means I've now "lost" the old "A" because of the checksum fail - but maybe it only removes the newly uploaded file and not the one we are overwriting? – Niklas B Nov 17 '22 at 22:01
  • This is the code I was referring to: https://github.com/googleapis/python-storage/blob/main/google/cloud/storage/blob.py#L2212-L2218 – Niklas B Nov 18 '22 at 08:04
  • It looks like you are correct, there are situations where, when attempting to overwrite an object and corruption happens the python SDK will delete the corrupted verison leaving no object. (I deleted a comment that said otherwise, so that incorrect info is not left up) – David Nov 18 '22 at 16:28