3

Okay, I have a working apps that use amazon s3 multipart, they use CreateMultipart, UploadPart and CompleteMultiPart.

Now we are migrating to google cloud storage and we have a problem with multipart. As far as I understood google doesn't support s3 multipart, got info from here Google Cloud Storage support of S3 multipart upload.

So I see that google has closest method Compose https://cloud.google.com/storage/docs/composite-objects, where I just upload different objects and then send request to combine them, or I can use uploadType=multipart https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload#resumable, but this seems to be completely different from s3 multipart. And there is resumable upload https://cloud.google.com/storage/docs/resumable-uploads, that seems to allow upload files in chunks, but without complete multipart.

What is the best option to use? Some services already use CreateMultiPart, UploadPart, CompletePart and I need to write "adapter" for this services in order to make them compatible with google cloud storage.

NIck
  • 163
  • 1
  • 2
  • 12

1 Answers1

4

Update: below answer is no longer correct. GCS does support multipart uploads: https://cloud.google.com/storage/docs/xml-api/post-object-multipart

You are correct. Google Cloud Storage does not currently support multipart upload.

The main benefits to multipart upload are allowing multiple streams to upload in parallel from one or more machines and allowing a partial upload failure not to ruin the whole upload. The best way to get those same benefits with GCS would be to upload the parts as separate objects and then using Compose to combine them into a final object. Indeed, this is exactly what the gsutil command-line utility does when uploading in parallel.

Resumable uploads are a great tool if you want to upload a single object in a single stream, in order, and you want the ability to resume if the connection is lost.

"uploadtype=multipart" uploads are a bit different. They are a way to specify an object's complete metadata and also its data in a single upload operation, using an HTTP multipart request.

Brandon Yarbrough
  • 37,021
  • 23
  • 116
  • 145
  • I'm wondering whether multipart is useful at all in case the upload is happening from a single machine since the bandwidth may be shared between part uploads vs whole bandwidth used by single part in case of straight upload. All S3 SDKs use multipart under the hood when uploading anything more than few mbs. I'm wondering if it actually improves the throughput, any idea? – pinkpanther Aug 03 '20 at 14:12
  • Uploading several parts of an object in parallel from a single machine does in fact frequently increase throughput due to the working of TCP. You can achieve this with GCS by uploading several objects as separate objects and then using the `compose` API call to combine them into a single final object. The `gsutil` command can do this for your uploads if you use the `-m` flag. – Brandon Yarbrough Aug 03 '20 at 23:47
  • Thanks. I should have said same client/process instead of "single machine". But I think the answer is same right? – pinkpanther Aug 06 '20 at 17:54
  • Do be careful in distinguishing "supported" and "offered as a preview feature". The latter is a bit riskier to use in production systems, and the referenced article both states this is still considered a preview feature and may have limited support. Regrettably, this means your original answer still stands. – NBJack Jun 10 '21 at 17:34