0

I am very much new to AWS S3 and trying to upload large file by chunked process. From UI, i am sending the file's chunked data(blob) to WCF service from where i will upload it to S3 using MultiPartAPI. Note that, the file can be in GB's. That's why i am making chunks of the file and uploading it to S3.

public UploadPartResponse UploadChunk(Stream stream, string fileName, string uploadId, List<PartETag> eTags, int partNumber, bool lastPart)
{
    stream.Position = 0; // Throwing Exceptions

    //Step 1: build and send a multi upload request
    if (partNumber == 1)
    {
        var initiateRequest = new InitiateMultipartUploadRequest
        {
            BucketName = _settings.Bucket,
            Key = fileName
        };

        var initResponse = _s3Client.InitiateMultipartUpload(initiateRequest);
        uploadId = initResponse.UploadId;
    }

    //Step 2: upload each chunk (this is run for every chunk unlike the other steps which are run once)
    var uploadRequest = new UploadPartRequest
                        {
                            BucketName = _settings.Bucket,
                            Key = fileName,
                            UploadId = uploadId,
                            PartNumber = partNumber,
                            InputStream = stream,
                            IsLastPart = lastPart,
                            PartSize = stream.Length // Throwing Exceptions
                        };

    var response = _s3Client.UploadPart(uploadRequest);

    //Step 3: build and send the multipart complete request
    if (lastPart)
    {
        eTags.Add(new PartETag
        {
            PartNumber = partNumber,
            ETag = response.ETag
        });

        var completeRequest = new CompleteMultipartUploadRequest
        {
            BucketName = _settings.Bucket,
            Key = fileName,
            UploadId = uploadId,
            PartETags = eTags
        };

        try
        {
            _s3Client.CompleteMultipartUpload(completeRequest);
        }
        catch
        {
            //do some logging and return null response
            return null;
        }
    }

    response.ResponseMetadata.Metadata["uploadid"] = uploadRequest.UploadId;
    return response;
}

Here, stream.Position = 0 and stream.Length throwing exceptions like below:

at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.get_Length()

Then i saw that stream.CanSeek is false.

Do i need to actually buffer the entire stream, loading it into memory in advance to make it working?

Update: I am doing below and it's working but don't know whether it is efficient or not.

    var ms = new MemoryStream();
    stream.CopyTo(ms);
    ms.Position = 0;

Is there any other way to do this? Thanks in advance.

Setu Kumar Basak
  • 11,460
  • 9
  • 53
  • 85
  • I dont think you can modify the stream if you cant seek from it, that seems to be the best way of creating another stream and copying all the data in it, – mahlatse Mar 08 '19 at 15:46

3 Answers3

3

A bit late but note that TransferUtility supports streams directly:

https://docs.aws.amazon.com/AmazonS3/latest/dev/HLuploadFileDotNet.html

D2TheC
  • 2,203
  • 20
  • 23
2

There is a nice implementation on GitHub.

It uses memory streams to upload parts in to S3:

using Amazon.S3;
using Amazon.S3.Model;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

namespace Cppl.Utilities.AWS
{
    public class S3UploadStream : Stream
    {
        /* Note the that maximum size (as of now) of a file in S3 is 5TB so it isn't
         * safe to assume all uploads will work here.  MAX_PART_SIZE times MAX_PART_COUNT
         * is ~50TB, which is too big for S3. */
        const long MIN_PART_LENGTH = 5L * 1024 * 1024; // all parts but the last this size or greater
        const long MAX_PART_LENGTH = 5L * 1024 * 1024 * 1024; // 5GB max per PUT
        const long MAX_PART_COUNT = 10000; // no more than 10,000 parts total
        const long DEFAULT_PART_LENGTH = MIN_PART_LENGTH;

        internal class Metadata
        {
            public string BucketName;
            public string Key;
            public long PartLength = DEFAULT_PART_LENGTH;

            public int PartCount = 0;
            public string UploadId;
            public MemoryStream CurrentStream;

            public long Position = 0; // based on bytes written
            public long Length = 0; // based on bytes written or SetLength, whichever is larger (no truncation)

            public List<Task> Tasks = new List<Task>();
            public ConcurrentDictionary<int, string> PartETags = new ConcurrentDictionary<int, string>();
        }

        Metadata _metadata = new Metadata();
        IAmazonS3 _s3 = null;

        public S3UploadStream(IAmazonS3 s3, string s3uri, long partLength = DEFAULT_PART_LENGTH)
            : this(s3, new Uri(s3uri), partLength)
        {
        }

        public S3UploadStream(IAmazonS3 s3, Uri s3uri, long partLength = DEFAULT_PART_LENGTH)
            : this (s3, s3uri.Host, s3uri.LocalPath.Substring(1), partLength)
        {
        }

        public S3UploadStream(IAmazonS3 s3, string bucket, string key, long partLength = DEFAULT_PART_LENGTH)
        {
            _s3 = s3;
            _metadata.BucketName = bucket;
            _metadata.Key = key;
            _metadata.PartLength = partLength;
        }

        protected override void Dispose(bool disposing)
        {
            if (disposing)
            {
                if (_metadata != null)
                {
                    Flush(true);
                    CompleteUpload();
                }
            }
            _metadata = null;
            base.Dispose(disposing);
        }
    
        public override bool CanRead => false;
        public override bool CanSeek => false;
        public override bool CanWrite => true;
        public override long Length => _metadata.Length = Math.Max(_metadata.Length, _metadata.Position);

        public override long Position
        {
            get => _metadata.Position;
            set => throw new NotImplementedException();
        }

        public override int Read(byte[] buffer, int offset, int count) => throw new NotImplementedException();
        public override long Seek(long offset, SeekOrigin origin) => throw new NotImplementedException();

        public override void SetLength(long value)
        {
            _metadata.Length = Math.Max(_metadata.Length, value);
            _metadata.PartLength = Math.Max(MIN_PART_LENGTH, Math.Min(MAX_PART_LENGTH, _metadata.Length / MAX_PART_COUNT));
        }

        private void StartNewPart()
        {
            if (_metadata.CurrentStream != null) {
                Flush(false);
            }
            _metadata.CurrentStream = new MemoryStream();
            _metadata.PartLength = Math.Min(MAX_PART_LENGTH, Math.Max(_metadata.PartLength, (_metadata.PartCount / 2 + 1) * MIN_PART_LENGTH)); 
        }

        public override void Flush()
        {
            Flush(false);
        }

        private void Flush(bool disposing)
        {
            if ((_metadata.CurrentStream == null || _metadata.CurrentStream.Length < MIN_PART_LENGTH) &&
                !disposing)
                return;

            if (_metadata.UploadId == null) {
                _metadata.UploadId = _s3.InitiateMultipartUploadAsync(new InitiateMultipartUploadRequest()
                {
                    BucketName = _metadata.BucketName,
                    Key = _metadata.Key
                }).GetAwaiter().GetResult().UploadId;
            }
            
            if (_metadata.CurrentStream != null)
            {
                var i = ++_metadata.PartCount;

                _metadata.CurrentStream.Seek(0, SeekOrigin.Begin);
                var request = new UploadPartRequest()
                {
                    BucketName = _metadata.BucketName,
                    Key = _metadata.Key,
                    UploadId = _metadata.UploadId,
                    PartNumber = i,
                    IsLastPart = disposing,
                    InputStream = _metadata.CurrentStream
                };
                _metadata.CurrentStream = null;

                var upload = Task.Run(async () =>
                {
                    var response = await _s3.UploadPartAsync(request);
                    _metadata.PartETags.AddOrUpdate(i, response.ETag,
                        (n, s) => response.ETag);
                    request.InputStream.Dispose();
                });
                _metadata.Tasks.Add(upload);
            }
        }

        private void CompleteUpload()
        {
            Task.WaitAll(_metadata.Tasks.ToArray());

            if (Length > 0) {
                _s3.CompleteMultipartUploadAsync(new CompleteMultipartUploadRequest()
                {
                    BucketName = _metadata.BucketName,
                    Key = _metadata.Key,
                    PartETags = _metadata.PartETags.Select(e => new PartETag(e.Key, e.Value)).ToList(),
                    UploadId = _metadata.UploadId
                }).GetAwaiter().GetResult();
            }
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            if (count == 0) return;

            // write as much of the buffer as will fit to the current part, and if needed
            // allocate a new part and continue writing to it (and so on).
            var o = offset;
            var c = Math.Min(count, buffer.Length - offset); // don't over-read the buffer, even if asked to
            do
            {
                if (_metadata.CurrentStream == null || _metadata.CurrentStream.Length >= _metadata.PartLength)
                    StartNewPart();

                var remaining = _metadata.PartLength - _metadata.CurrentStream.Length;
                var w = Math.Min(c, (int)remaining);
                _metadata.CurrentStream.Write(buffer, o, w);

                _metadata.Position += w;
                c -= w;
                o += w;
            } while (c > 0);
        }
    }
}

With slight modifications, you can make it async and use Microsoft.IO.RecyclableMemoryStream to avoid GC pressure.

Inbar Barkai
  • 282
  • 1
  • 3
  • 12
1

That's a fair way of doing it, but I opted for a different approach by uploading directly to S3 using signed URLs. This has the benefit of taking some load off of your server, and reducing data transfer.

Depending on your application, it may be worth considering this:

In C# get the Presigned URL:

public string GetPreSignedUrl(string bucketName, string keyPrefix, string fileName)
{
    var client = new AmazonS3Client(_credentials, _region);
    var keyName = $"{keyPrefix}/{fileName}";
    var preSignedUrlRequest = new GetPreSignedUrlRequest()
    {
        BucketName = bucketName,
        Key = keyName,
        Expires = DateTime.Now.AddMinutes(5),
        Protocol = (Protocol.HTTPS)
    };
    return client.GetPreSignedURL(preSignedUrlRequest);
}

This creates a URL for a client to upload directly to S3, which you need to pass to the UI. Then you can use a multipart upload to the presigned url.

Here is a good example of multipart upload using axious: https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload/blob/master/frontend/pages/index.js

Matt D
  • 3,289
  • 1
  • 15
  • 29