My code runs in a small VM (low memory and low disk), it receives a stream of an unknown size over a HTTP stream and will upload it to AWS S3. It could be a couple TB, I want to be able to reach S3's 5TB maximum, I don't receive a Content-Length header on the stream, and another company makes the server (and it's compressed on the fly and they don't have neither the memory nor the disk space to store it where they are running either).
Our code is in node.js, and I'm struggling to move in the constraints.
When using multipart upload, each part would have to be 500MB (worst case 5TB divided by 10000) which doesn't fit in memory, and certainly not in a queue. The parts have to be in memory to compute their body hash header and measure their length.
I looked into the aws-chunked encoding, but it requires prior knowledge of the total transaction length in advance to populate the x-amz-decoded-content-length
header, but allows for arbitrarily small chunking, contrary to multipart.
The only path I see right now is re-implementing a non-buffering and non-queuing version of multipart-upload that would send 500MB parts over https with an unsigned payload header, the last one being padded with zeros, then committing the file and then going back to clip the padded end of the file. This seems incredibly convoluted, I must have missed something.
I am ok with using the latest version the the AWS SDK JS 3, or the version 2, or any library.