78

I want to use the AWS S3 cli to copy a full directory structure to an S3 bucket.

So far, everything I've tried copies the files to the bucket, but the directory structure is collapsed. (to say it another way, each file is copied into the root directory of the bucket)

The command I use is:

aws s3 cp --recursive ./logdata/ s3://bucketname/

I've also tried leaving off the trailing slash on my source designation (ie, the copy from argument). I've also used a wildcard to designate all files ... each thing I try simply copies the log files into the root directory of the bucket.

agentv
  • 970
  • 1
  • 6
  • 11
  • 7
    Yes! That is definitely the answer. Unlike in Unix, the cp command (and the sync command) do not create a target directory on the destination side unless you ask them to do so. So if you `aws s3 cp --recursive mylocalsrcdir s3://bucket/` then it will simply put the files in your local repository in the bucket "root directory" If you do `aws s3 cp --recursive mydirectory s3://bucket/mydirectory` then it will recreate the directory structure on the target end. – agentv Apr 16 '15 at 00:27

8 Answers8

88

I believe sync is the method you want. Try this instead:

aws s3 sync ./logdata s3://bucketname/
Chad Smith
  • 1,489
  • 8
  • 8
  • 10
    ...I was excited to try that, but it gave me the same results as the cp command. The files from my ./logfiles directory were copied to the root "directory" in the bucket. One thing that did work though was to try this: `aws s3 sync ./logdata s3://bucketname/logdata` Thanks for the lead. ---v – agentv Apr 16 '15 at 00:11
  • unfortunately, even with your suggestion agentv I got the same result sync didn't preserve the directory structure and just flattened everything out. – niharvey Feb 08 '17 at 18:46
  • 1
    UPDATE* - nevermind my directory structure got messed up in on the extract – niharvey Feb 08 '17 at 19:25
  • Also worth bearing in mind - `aws s3 sync` will only upload files if they have a later 'last modified date' than what is already there. If not, it will silently leave the existing files in place. – Jason Holloway Sep 08 '22 at 09:57
30

The following worked for me:

aws s3 cp ~/this_directory s3://bucketname/this_directory --recursive

AWS will then "make" this_directory and copy all of the local contents into it.

17

I had faced this error while using either of these commands.

$ aws s3 cp --recursive /local/dir s3://s3bucket/
OR
$ aws s3 sync /local/dir s3://s3bucket/

I even thought of mounting the S3 bucket locally and then run rsync, even that failed (or got hung for few hours) as I have thousands of file.

Finally, s3cmd worked like a charm.

s3cmd sync /local/dir/ --delete-removed s3://s3bucket/ --exclude="some_file" --exclude="*directory*"  --progress --no-preserve

This not only does the job well and shows quite a verbose output on the console, but also uploads big files in parts.

vikas027
  • 1,189
  • 2
  • 11
  • 15
  • 1
    tl;dr: wild card file globbing worked better in s3cmd for me. As cool as aws-cli is --for my one-time S3 file manipulation issue that didn't immediately work as I would hope and thought it might-- I ended up installing and using s3cmd. Whatever syntax and behind the scenes work I conceptually imagined, s3cmd was more intuitive and accomodating to my baked in preconceptions. Maybe it isn't the answer you came here for, but it worked for me. – BradChesney79 Mar 08 '17 at 20:08
  • That is useful @BradChesney79 – agentv Feb 22 '18 at 03:02
  • It would be good to describe the options you are using on the sync command. Also there is no "cp" command for the s3cmd? why use sync instead of cp? – VinGarcia Apr 03 '18 at 10:38
6

(Improving the solution of Shishir)

  • Save the following script in a file (I named the file s3Copy.sh)
path=$1 # the path of the directory where the files and directories that need to be copied are located
s3Dir=$2 # the s3 bucket path

for entry in "$path"/*; do
    name=`echo $entry | sed 's/.*\///'`  # getting the name of the file or directory
    if [[ -d  $entry ]]; then  # if it is a directory
        aws s3 cp  --recursive "$name" "$s3Dir/$name/"
    else  # if it is a file
        aws s3 cp "$name" "$s3Dir/"
    fi
done
  • Run it as follows:
    /PATH/TO/s3Copy.sh /PATH/TO/ROOT/DIR/OF/SOURCE/FILESandDIRS PATH/OF/S3/BUCKET
    For example if s3Copy.sh is stored in the home directory and I want to copy all the files and directories located in the current directory, then I run this:
    ~/s3Copy.sh . s3://XXX/myBucket

You can easily modify the script to allow for other arguments of s3 cp such as --include, --exclude, ...

LoMaPh
  • 192
  • 1
  • 3
  • Hint: if you're in the habit of #!/bin/sh at the top of your scripts, #!/bin/bash this one. The [[ ]] is added in bash: https://stackoverflow.com/a/3427931/509907 – Stateful Nov 20 '21 at 21:39
3

Use the following script for copying folder structure:

s3Folder="s3://xyz.abc.com/asdf";

for entry in "$asset_directory"*
do
    echo "Processing - $entry"
    if [[ -d  $entry ]]; then
        echo "directory"
        aws s3 cp  --recursive "./$entry" "$s3Folder/$entry/"
    else
        echo "file"
        aws s3 cp "./$entry" "$s3Folder/"
    fi
done
3

I couldn't get s3 sync or s3 cp to work on a 55 GB folder with thousands of files and over 2 dozen subdirectories inside. Trying to sync the whole folder would just cause awscli to fail silently without uploading anything to the bucket.

Ended up doing this to first sync all subdirectories and their contents (folder structure is preserved):

nice find . -mindepth 1 -maxdepth 1 -type d | cut -c 3- | while read line; do aws s3 sync $"$line" "s3://bucketname/$line"; done

Then I did this to get the 30,000 files in the top level:

nice find . -mindepth 1 -maxdepth 1 -type f | cut -c 3- | while read line; do aws s3 cp "$line" "s3://bucketname/";

Make sure to watch the load on the server (protip you can use w to just show the load) and ctrl-z to suspend the command if load gets too high. (fg to continue it again).

Putting this here in case it helps anyone in a similar situation.

Notes:

-mindepth 1 excludes .

-maxdepth 1 prevents find from listing contents of sub-directories, since s3 sync handles those successfully.

cut -c 3- removes the "./" from the beginning of each result from find.

twhitney
  • 33
  • 6
1

This works for me.. aws s3 sync mydir s3://rahuls-bucket/mydir

brahul
  • 11
  • 1
1

Alternatively you could also try minio client aka mc

$ mc cp Desktop/test/test/test.txt s3/miniocloud/Desktop/test/test/

Hope it help.

PS: I am one of the contributor to the project.

koolhead17
  • 401
  • 3
  • 6
  • 1
    Credit where credit is due: mc did the job and preserved dir structure - awesome! I was already pissed off to install > 200 megabytes of Python & Pip crap to use awscli and to read here that it collapses the dir structure.. – joonas.fi Aug 15 '16 at 14:09