I'm using the NOUNZ data compiler on OSX (or Linux), which automatically generates a massive directory structure of static HTML files (hundreds of thousands and sometimes millions of files).
A simplified example of the generated directory tree looks something like the following...
Normally, if I want to move the entire tree to a remote web server, I simply tar and compress the tree, using the commands:
tar -cvf HTML.tar HTML
gzip HTML.tar
This generates a tar-ed and compressed file called HTML.tar.gz
I can then FTP or SCP the above file to the remote web server and I can simply uncompress an untar the file using the following commands:
gzip -d HTML.tar.gz
tar -xvf HTML.tar
This will result in the exact same file tree on the web server that was generated by the data compiler on the local machine.
THE PROBLEM: I'd like to mimic the same behavior as above using Amazon Web Services (AWS) Simple Storage Solution (S3).
MY QUESTION: What is the best way to mimic the same (or similar behavior), where I can move the entire tar-ed and compressed tree from a local server to AWS S3, and then uncompress and untar my file to recreate the entire directory structure?
The tar
and gzip
commands are not a part of the S3 CLI API so I need to find a solid way of moving a directory structure that can contain millions of files (which will happen possibly once a day). It would be VERY slow to move and recreate everything without first tar-ing and compressing.
NOTE: Just an FYI that when the data compiler runs, it always deletes the entire old tree and regenerates an entire new tree, resulting in completely new inodes for all directories and files. This means "incremental" copies and syncs are not viable. I need to move the whole tree, each time.