1

We currently have a small web app, part of which is file uploads. Currently we are using Plupload on the client with chunking enabled to allow large files be uploaded. The files are saved on the app server and the chunks are appended as they come up.

Now we are moving to Amazon S3 for file storage with the possiblity of multiple app servers. I'm finding it difficult how to handle these chunks. I was trying to follow their example, but I'm running into problems. The meat of what I'm trying looks like this:

UploadPartRequest uploadRequest = new UploadPartRequest()
    .withBucketName(bucket).withKey(key)
    .withUploadId(uploadId).withPartNumber(partNumber)
    .withPartSize(bytes.length)
    .withInputStream(new ByteArrayInputStream(bytes));

s3Client.uploadPart(uploadRequest);

The problem I'm having is that I need to somehow know the uploadId of the chunk. I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up? I thought I could perhaps send it down with the first response, and then send it back up with each chunk request. That didn't seem like too far of reach.

Then I found that in order to complete the upload I need a List<PartETag> with the PartETags getting returned from each upload to Amazon S3. So, my next question was how do I save all of these PartETags while the chunks are being uploaded from the browser? My first thought was I could send down the PartETag of each chunk in the response, and then store those client side. I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETags. IF there's not, I'd just have to send up all the ones I have each time, and then only the last request would use them. This all seems to be a little hacky to me.

So, I'm thinking someone has to have dealt with this before. Is there a good, standard way of doing this?

I thought about constructing the file on the app server and then sending it over to S3, but with multiple app servers, the chunks aren't guaranteed to end up in the same place.

Another thought I've had is to store all this information in the database during the upload, but I wasn't sure I wanted to have to go hit the database with each chunk request. Are there any other options besides this?

I appreciate any help anyone can provide.

dnc253
  • 39,967
  • 41
  • 141
  • 157

2 Answers2

1

Try our IaaS solution:

https://uploadcare.com

It supports file size up to 5GB. Here is an article about a successful use case for uploading large files using our system:

https://community.skuidify.com/skuid/topics/how_to_upload_large_files_using_uploadcare_com

David Avsajanishvili
  • 7,678
  • 2
  • 22
  • 24
0

Correct me if I'm wrong, but as I understand your question your web servers act as proxies between the browser and the client.

The problem I'm having is that I need to somehow know the uploadId of the chunk. I have it when I get the InitiateMultipartUploadResult from the initializing of the upload, but how do I associate that with later chunks that come up?

On BeforeUpload you may add the uploadId as querystring parameter, as in this answer

My first thought was I could send down the PartETag of each chunk in the response, and then store those client side.

This seems a good idea, then altering the querystring as above on 'ChunkUploaded' to add the just received PartETag, thus transfering all previously received PartETag with each request. Not sure altering the querystring between chunks is possible, or if you can synchronously do some processing before upload of next chunk starts, but it is worth a try I would say.

I'm not sure if there's a way of knowing when the last chunk is being uploaded, so that I can send up all these PartETags.

This can be found in the php samples in the plupload download : two POST parameters are sent by plupload to the server

  • chunks : total number of chunks of the upload (0 if upload not chunked)
  • chunk : index of the current chunk being uploaded

The last chunk is when chunks==0 || chunk==chunks-1

Community
  • 1
  • 1
jbl
  • 15,179
  • 3
  • 34
  • 101
  • Server side I know when the last chunk is being uploaded, but I don't think there's a way of knowing that client side. I decided to go the DB route, because I didn't want my client side code to have to know so much about what the server is doing. That way if our storage mechanism changes, I don't have to refactor the client code. Thanks for the feedback. – dnc253 Nov 26 '13 at 16:30
  • @dnc you're welcome. BTW to know the number of chunks client-side, you have the option to pass it as part of the response triggering ChunkUploaded. – jbl Nov 26 '13 at 16:53