s3cmd copy files preserving path

Question

Is there a way to use copy files to an S3 bucket by preserving the file path?

This is the example: 1. I produce a list of files that are different in bucket1 then in bucket2 using s3cmd sync --dry-run

The list looks like this:

s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd
s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml
s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/app-store/.content.xml

I need to process this list to upload to a new location in the bucket (e.g. s3://bucket/diff/) only the files in the list BUT with the full path as shown in the list.

A simple loop like this:

diff_file_list=$(s3cmd -c s3cfg sync --dry-run s3://BUCKET/20150831/PROD s3://BUCKET/20150831/DEV | awk '{print $2}')
for f in $diff_file_list; do
    s3cmd -c s3cfg cp $f s3://BUCKET/20150831/DIFF/
done

does not work; it produces this:

File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/definition/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/nodetypes.cnd copied to s3://BUCKET/20150831/DIFF/nodetypes.cnd
File s3://BUCKET/20150831/PROD/JC-migration-test-01/META-INF/vault/properties.xml copied to s3://BUCKET/20150831/DIFF/properties.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml
File s3://BUCKET/20150831/PROD/JC-migration-test-01/jcr_root/content/origin-store/.content.xml copied to s3://BUCKET/20150831/DIFF/.content.xml

Thanks,

You have $f in your hand so do some basic editing of that using sed/awk or equivalent to produce $g which contains the complete, correct target filename and then copy $f to $g. — jarmod, Aug 31 '15 at 23:30
yes, I ended up using sed to repeat the line and replace PROD to DIFF. Then I am reading the lines one by one and pass them as parameter to the sync command. Not very elegant but works.... — Adi Chiru, Sep 01 '15 at 22:58

score 0 · Answer 1 · answered Sep 01 '15 at 23:10

Short answer: not it is not! That is because the paths in S3 buckets are not actually directories/folders and the S3 bucket have no such concepts of structure even if various tools are presenting it this way (including s3cmd which is really confusing...).

So, the "path" is actually a prefix (although the sc3cmd sync to local knows how to translate this prefix in a directory structure on your filesystem).

For a bash script the solution is: 1. create a file listing all the paths from a s3cmd sync --dry-run command (basically a list of diffs) => file1

copy that file and use sed to modify the paths as needed: sed 's/(^s3.*)PROD/\1DIFF/') => file2
Merge the files so that line1 in file1 is continued by line1 in file2 and so on: paste file1 file2 > final.txt
Read final.txt, line by line, in a loop and use each line as a set of 2 parameters to a copy or syun command:

while IFS='' read -r line || [[ -n "$line" ]]; do s3cmd -c s3cfg sync $line done < "final.txt"

Notes: 1. $line in the s3cmd must not be in quotes; if it is the sync command will complain that it received one parameter only... of course! 2. the [[ -n "$line" ]] is used here so that read will not fail of the last line has not new line character

Boto could not help more unfortunately so if you need something similar in python you would do it pretty much the same....

s3cmd copy files preserving path

1 Answers1