23

I know Amazon S3 added the multi-part upload for huge files. That's great. What I also need is a similar functionality on the client side for customers who get part way through downloading a gigabyte plus file and have errors.

I realize browsers have some level of retry and resume built in, but when you're talking about huge files I'd like to be able to pick up where they left off regardless of the type of error out.

Any ideas?

Thanks, Brian

Bth
  • 423
  • 1
  • 4
  • 11
  • I've been looking for some useful bit of sample code or SDK documentation w/o any luck. The main issue is Amazon doesn't generate the contentMD5 has when you ask for a range of data. So if you have the file partially downloaded, what you really want to do is calculate the MD5 on what you have downloaded and then ask Amazon if that range of bytes has the same hash so you can just append the rest of the file from Amazon. No such API for ("hey Amazon, give me the MD5 for this range of bytes in the file on S3" exists AFAIK :-( – kenyee Jan 17 '14 at 16:48
  • 3
    Hi Brian. If you were able to get your question answered, can you choose a correct answer? Helps other folks who come to the page looking for that same help. – rICh Jun 24 '15 at 14:53

5 Answers5

13

S3 supports the standard HTTP "Range" header if you want to build your own solution.

S3 Getting Objects

Uriah Carpenter
  • 6,656
  • 32
  • 28
  • 1
    Java API: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/GetObjectRequest.html#setRange(long,%20long) – Michal Čizmazia Mar 12 '14 at 16:06
4

I use aria2c. For private content, you can use "GetPreSignedUrlRequest" to generate temporary private URLs that you can pass to aria2c

Ameer Deen
  • 704
  • 8
  • 20
2

Just updating for current situation, S3 natively supports multipart GET as well as PUT. https://youtu.be/uXHw0Xae2ww?t=1459.

Adolfo
  • 4,969
  • 4
  • 26
  • 28
rICh
  • 1,709
  • 2
  • 15
  • 25
1

S3 has a feature called byte range fetches. It’s kind of the download compliment to multipart upload:

Using the Range HTTP header in a GET Object request, you can fetch a byte-range from an object, transferring only the specified portion. You can use concurrent connections to Amazon S3 to fetch different byte ranges from within the same object. This helps you achieve higher aggregate throughput versus a single whole-object request. Fetching smaller ranges of a large object also allows your application to improve retry times when requests are interrupted. For more information, see Getting Objects.

Typical sizes for byte-range requests are 8 MB or 16 MB. If objects are PUT using a multipart upload, it’s a good practice to GET them in the same part sizes (or at least aligned to part boundaries) for best performance. GET requests can directly address individual parts; for example, GET ?partNumber=N.

Source: https://docs.aws.amazon.com/whitepapers/latest/s3-optimizing-performance-best-practices/use-byte-range-fetches.html

Yann Stoneman
  • 953
  • 11
  • 35
0

NOTE: For Ruby user only

Try aws-sdk gem from Ruby, and download

object = AWS::S3::Object.new(...)
object.download_file('path/to/file.rb')

Because it download a large file with multipart by default.

Files larger than 5MB are downloaded using multipart method

http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#download_file-instance_method

kenju
  • 5,866
  • 1
  • 41
  • 41