How to download S3-Bucket, compress on the fly and reupload to another s3 bucket without downloading locally?

Question

I want to download the contents of a s3 bucket (hosted on wasabi, claims to be fully s3 compatible) to my VPS, tar and gzip and gpg it and reupload this archive to another s3 bucket on wasabi!

My vps machine only has 30GB of storage, the whole buckets is about 1000GB in size so I need to download, archive, encrypt and reupload all of it on the fly without storing the data locally.

The secret seems to be in using the | pipe command. But I am stuck even in the beginning of download a bucket into an archive locally (I want to go step by step):

s3cmd sync s3://mybucket | tar cvz archive.tar.gz -

In my mind at the end I expect some code like this:

s3cmd sync s3://mybucket | tar cvz | gpg --passphrase secretpassword | s3cmd put s3://theotherbucket/archive.tar.gz.gpg

but its not working so far!

What am I missing?

What are you actually wanting to achieve? Is your intention to make a _backup_ of the data, or perhaps to save storage costs by compressing it? — John Rotenstein, Oct 28 '19 at 02:45
My goal is to make a backup of my files of one s3 bucket (used by nextcloud instance with external storage support) that is compressed and encrypted in one archive. I want to achieve that by using a small vps server that has way less storage than the contents of this s3 bucket is in size. — Markus, Oct 28 '19 at 15:54
You might be able to use a backup utility like [Cloudberry Backup](https://www.cloudberrylab.com/backup.aspx), since it understands how to use S3. — John Rotenstein, Oct 28 '19 at 22:00
Thanks for the help so far. As I understand Cloudberry Backup is a desktop client to help make regular backups of specific files/folders. It looks it cannot help with my exact problem right? — Markus, Oct 29 '19 at 05:49
While it would not make a tar file, it can backup files to/from S3 and can compress files (I think). — John Rotenstein, Oct 29 '19 at 05:57
But what I want to achieve is a single compressed archive of all my files which I could easily download/transfer to other servers or download to my machine at home instead of using and tool like s3cmd oder CloudBerry Backup to download hundreds of thousands of files (which are now in my s3 bucket) — Markus, Oct 29 '19 at 07:02
so to sum it up! there is not stdout for aws-cli or s3cmd so my problem can't be solved as I want it right? thanks so far for a lot of help and input — Markus, Oct 29 '19 at 08:50

score 1 · Answer 1 · answered Oct 28 '19 at 02:44

1

The aws s3 sync command copies multiple files to the destination. It does not copy to stdout.

You could use aws s3 cp s3://mybucket - (including the dash at the end) to copy the contents of the file to stdout.

From cp — AWS CLI Command Reference:

The following cp command downloads an S3 object locally as a stream to standard output. Downloading as a stream is not currently compatible with the --recursive parameter:

aws s3 cp s3://mybucket/stream.txt -

This will only work for a single file.

answered Oct 28 '19 at 02:44

John Rotenstein

241,921
22
380
470

Thank you for the reply so far. So is there any tool (including s3cmd ?) that is able to download the whole bucket and give it out to stdout? – Markus Oct 28 '19 at 15:55
There's not really a concept of "passing multiple files to stdout". It would just generate a concatenated string of all files joined together. Tar expects a list of filenames, not the contents of files. You might be able to append files to a Zip file, but this gets tricky with S3 since it can only append by using multi-part uploads. The "real" Amazon S3 gets around all these problems by offering a **Storage Class** that is lower-cost but designed for archiving. No need to do strange things with the files. For example, Glacier Deep Archive is 1/6th the price of Wasabi. – John Rotenstein Oct 28 '19 at 22:04
I know about Glacier Deep Archive. But its actually not what I am looking for. I want to have a secure (gpg) slightly compressed (gzip) regular backup of my whole bucket in a single file and I want to create it on a machine which has not enough storage to store all that at once on its harddrive. I thought it would be possible. But your explanations brought some light into s3 syntax/systematic. It just confused me (in the first place) that I can tar and pipe right into a s3-put-command but not the other direction. – Markus Oct 29 '19 at 05:52
And why I am using Wasabi instead of genuine S3 from Amazon is that Glacier/Deep Glacier cannot be accessed quickly. The restoring takes time. The price point is okay but the access times not. – Markus Oct 29 '19 at 06:34

score 0 · Answer 2 · answered Jan 17 '20 at 01:51

0

You may try https://github.com/kahing/goofys. I guess, in your case it could be the following algo:

$ goofys source-s3-bucket-name /mnt/src
$ goofys destination-s3-bucket-name /mnt/dst
$ tar -cvzf /mnt/src | gpg -e -o /mnt/dst/archive.tgz.gpg

answered Jan 17 '20 at 01:51

Igor Ostapenko

1
1

Please provide answers that don't require clarification from the user. This requires the user to install `goofys` – hongsy Jan 17 '20 at 04:01

How to download S3-Bucket, compress on the fly and reupload to another s3 bucket without downloading locally?

2 Answers2