If space is at a premium at the location where you initially the file, then uploading the file to S3, and subsequently downloading, compressing, and re-uploading the file to S3 on an EC2 instance in the same region as the S3 bucket is actually a very sensible (if seemingly counter-intuitive) suggestion, for one simple reason:
AWS does not charge you for bandwidth between EC2 and S3 within the same region.
This is an ideal job for a spot instance... and a good use case for SQS to tell the spot machine what needs to be done.
On the other hand... you're spending more of your local bandwidth uploading that file if you don't compress it first.
If you are a programmer, you should be able to craft a utility similar to the one I have written for internal use (this is not a plug; it's not currently available for release) that compresses (via external tools) and uploads files to S3 on-the-fly.
It works something like this pseudocode example command line:
cat input_file | gzip -9c | stream-to-s3 --bucket 'the-bucket' --key 'the/path'
That's a simplified usage example, to illustrate the concept. Of course, my "stream-to-s3" utility accepts a number of other arguments, including x-amz-meta metadata, the aws access key and secret, but you get the idea, perhaps.
Common compression utilities like gzip, pigz, bzip2, pbzip2, xz, and pixz all can read the source file from STDIN
and write the compressed data to STDOUT
without ever writing the compressed version of the file to disk.
The utility I use reads the file data from its STDIN
via the pipeline, and, using S3 Multipart Upload (even for small files that don't technically need it, because S3 Multipart Upload cleverly does not require you to know the size of the file in advance), it just keeps sending data to S3 until it reaches EOF
on its input stream. Then it completes the multipart upload and ensures that everything succeeded.
I use this utility to build and upload entire tarballs, with compression, without ever touching a single block of disk space. Again, it was not particularly difficult to write, and could have been done in a number of languages. I didn't even use any S3 SDK, I rolled my own from scratch, using a standard HTTP user agent and the S3 API documentation.