2

I implemented S3 multi-part upload, both high level and low level version, based on the sample code from http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?HLuploadFileJava.html and http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?llJavaUploadFile.html

When I uploaded files of size less than 4 GB, the upload processes completed without any problem. When I uploaded a file of size 13 GB, the code started to show IO exception, broken pipes. After retries, it still failed.

Here is the way to repeat the scenario. Take 1.1.7.1 release,

  1. create a new bucket in US standard region
  2. create a large EC2 instance as the client to upload file
  3. create a file of 13GB in size on the EC2 instance.
  4. run the sample code on either one of the high-level or low-level API S3 documentation pages from the EC2 instance
  5. test either one of the three part size: default part size (5 MB) or set the part size to 100,000,000 or 200,000,000 bytes.

So far the problem shows up consistently. I attached here a tcpdump file for you to compare. In there, the host on the S3 side kept resetting the socket.

spencerho
  • 21
  • 1
  • 2

1 Answers1

2

Although this is c# code, it shows the timeout and partsize settings that I am using to successfully copy files up to 15GB. Perhaps AutoCloseStream needs to be set to false.

using (FileStream fileStream = File.OpenRead(file.FullName))
                    {

                        TransferUtilityUploadRequest request = new TransferUtilityUploadRequest()
                        {
                            AutoCloseStream = false,
                            Timeout = 1200000,
                            BucketName = Settings.Bucket,
                            Key = file.Name,
                            InputStream = fileStream,
                            PartSize = 6291456 // 6MB
                        };

                        Console.Write("{0}...", file.Name);
                        Begin();
                        tu.Upload(request);
                        End();
                        Console.WriteLine("Done. [{0}]", Duration());
                    }
Jason Watts
  • 438
  • 1
  • 3
  • 10
  • Thank you Jason. Unfortunately, Java API does not have the AutoCloseStream attribute for me to set. I guess it is either designed differently or the underneath stream implementation is different so for the AutoCloseStream attribute. I have tried 5MB and 10MB, 100MB, 200MB, 500MB and 1GB while trying to make this work. They all pretty much stopped around the same time when about 8GB were uploaded. The tcp stream was closed from server (S3) side based on the tcp dump. When the tcp reset was sent from http server side, the s3client is still holding the socket and sending bytes. – spencerho Mar 15 '11 at 18:12
  • Hi Jason, I just read the doc on TransferUtility of C#. It states "When uploading large files by specifing files path as opposed to a stream, TransferUtility uses multiple threads to upload multiple parts of a single upload at once. When dealing with large content sizes and high bandwidth, this can have a significant increase on throughput." Do you get to verify by setting file stream it is multi-part upload using multi-threads? – spencerho Mar 15 '11 at 18:41
  • 1
    Found the cause. It is in the Java SDK 1.1.7.1 implementation. Please see this post on Amazon forum: [RepeatableFileInputStream skip() causes problem for large files >10GB: bug?] (forums.aws.amazon.com/thread.jspa?threadID=62975&tstart=0) – spencerho Mar 23 '11 at 17:10
  • Hi Jason, Thank you for replying to my question. The cause is indeed Amazon Java SDK implementation. It may not be in C# library. – spencerho Mar 23 '11 at 17:11