copy data from s3 to local with prefix

Question

I am trying to copy data from s3 to local with prefix using aws-cli.

But I am getting error with different regex.

aws s3 cp s3://my-bucket-name/RAW_TIMESTAMP_0506* . --profile prod

error:

no matches found: s3://my-bucket-name/RAW_TIMESTAMP_0506*

please mention information about "RAW_TIMESTAMP_0506" i.e. is it a file name pattern , prefix etc. — tom, Jun 16 '17 at 08:36

score 50 · Answer 1 · answered Mar 07 '19 at 19:01

50

aws s3 cp s3://my-bucket/ <local directory path> --recursive --exclude "*" --include "<prefix>*"

This will copy only files with given prefix

answered Mar 07 '19 at 19:01

Eyshika

1,041
1
11
18

yes if you specify . then things will be downloaded to root, whereas specifying ./path-to-folder make things to be downloaded to particular path !! It worked thanks.. – whoami - fakeFaceTrueSoul Aug 28 '19 at 03:39

score 14 · Answer 2 · answered Sep 06 '17 at 14:39

14

The above answers to not work properly... for example I have many thousands of files in a directory by date, and I wish to retrieve only the files that are needed.. so I tried the correct version per the documents:

aws s3 cp s3://mybucket/sub /my/local/ --recursive --exclude "*" --include "20170906*.png"

and it did not download the prefixed files, but began to download everything

so then I tried the sample above:

aws s3 cp s3://mybucket/sub/ . /my/local --recursive --include "20170906*"

and it also downloaded everything... It seems that this is an ongoing issue with aws cli, and they have no intention to fix it... Here are some workarounds that I found while Googling, but they are less than ideal.

https://github.com/aws/aws-cli/issues/1454

answered Sep 06 '17 at 14:39

Patrick Francis

191
1
5

4

use --exclude "*" also along with your command above it should work. – Bhavesh Sep 11 '17 at 12:34
1

For an S3 file with name `dev20200808.json`, I've tried :: `aws s3 cp s3://my-bucket-name/ ./path-to-my-local-folder --exclude "*" --include "*20200808*" --recursive` and it worked!! For the first time I've only had `--include`, thinking only matching objects will be copied but sadly everything in the bucket has been copied, then I've used `--exclude` along with `--include` to make it work. Finally, it seems to be first `exclude` everything and then `include` what's needed :-) Also if you just give `.` instead of `./path-to-my-local-folder` then files will be copied to root folder. – whoami - fakeFaceTrueSoul Aug 10 '20 at 21:13
A good workaround from the Git issue linked: `aws s3 ls "s3://bucket/prefix" | awk '{print $4}' | xargs -I % aws s3 cp s3://bucket/% .` – thisisdee Oct 28 '20 at 11:08

John Rotenstein · Accepted Answer · 2023-01-11T22:45:10.607

11

Updated: Added --recursive and --exclude

The aws s3 cp command will not accept a wildcard as part of the filename (key). Instead, you must use the --include and --exclude parameters to define filenames.

From: Use of Exclude and Include Filters

Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have --exclude "<value>" and --include "<value>" parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported.

So, you would use something like:

aws s3 cp --recursive s3://my-bucket-name/ . --exclude "*" --include "RAW_TIMESTAMP_0506*"

edited Jan 11 '23 at 22:45

answered Jun 16 '17 at 10:13

John Rotenstein

241,921
22
380
470

2

I don't think it is working without --recursive flag. I have one internal directory inside my bucket and i don't want to copy files from there. Please suggest – Bhavesh Jun 17 '17 at 06:06
3

Correct! Either use the `--recursive` flag, so use `aws s3 sync`, which includes subdirectories. – John Rotenstein Jun 18 '17 at 12:43
3

The command doesn't work at all. Need to also have exclude because `--include` doesn't exclude by itself. – Antti Haapala -- Слава Україні Nov 18 '20 at 09:12
This is accepted answer? How – Nate Cheng Jan 11 '23 at 22:08
Updated answer to use `--recursive` and `--exclude`. – John Rotenstein Jan 11 '23 at 22:45

score 6 · Answer 4 · edited May 23 '22 at 15:54

If you don't like silent consoles, you can pipe aws ls thru awk and back to aws cp.

Example

# url must be the entire prefix that includes folders.
# Ex.: url='s3://my-bucket-name/folderA/folderB',
# not url='s3://my-bucket-name'
url='s3://my-bucket-name/folderA/folderB'
prefix='RAW_TIMESTAMP_0506'
aws s3 ls "$url/$prefix" | awk '{system("aws s3 cp '"$url"'/"$4 " .")}'

Explanation

The ls part is pretty simple. I'm using variables to simplify and shorten the command. Always wrap shell variables in double quotes to prevent disaster.
awk {print $4} would extract only the filenames from the ls output (NOT the S3 Key! This is why url must be the entire prefix that includes folders.)
awk {system("echo " $4")} would do the same thing, but it accomplishes this by calling another command. Note: I did NOT use a subshell $(...), because that would run the entire ls | awk part before starting cp. That would be slow, and it wouldn't print anything for a looong time.
awk '{system("echo aws s3 cp "$4 " .")}' would print commands that are very close to the ones we want. Pay attention to the spacing. If you try to run this, you'll notice something isn't quite right. This would produce commands like aws s3 cp RAW_TIMESTAMP_05060402_whatever.log .
awk '{system("echo aws s3 cp '$url'/"$4 " .")}' is what we're looking for. This adds the path to the filename. Look closely at the quotes. Remember we wrapped the awk parameter in single quotes, so we have to close and reopen the quotes if we want to use a shell variable in that parameter.
awk '{system("aws s3 cp '"$url"'/"$4 " .")}' is the final version. We just remove echo to actually execute the commands created by awk. Of course, I've also surrounded the $url variable with double quotes, because it's good practice.

I wasn't bothered by the quietness, but I was bothered with timeouts when the copying took a long time and the token eventually expired. I ended up doing something vaguely like this, copying a listing of the files I wanted to my laptop and then copying over them one by one in a loop with `xargs`. — tripleee, Jul 30 '20 at 13:14
Basically `aws s3 ls s3://path/to/prefix/ | sed 's%^%s3://path/to/prefix/' >file.txt` and then `xargs -i aws s3 cp {} . — tripleee, Jul 30 '20 at 14:46
This is my favorite answer for the feedback loop in terminal. Thank you! — Vinay, Feb 16 '21 at 02:04
If you are not pointing directly to the files (but to other paths in between) and want to download files recursively you should use: `aws s3 ls "$url/$prefix" | awk '{system("aws s3 cp '"$url"'/"$2 " ./"$2 " --recursive ")}'` — Guillermo Garcia, Oct 11 '22 at 13:39
This answer is faster than other answers if you have thousands of hundreds of files in `$url`. — Brian, Jan 07 '23 at 07:21

copy data from s3 to local with prefix

4 Answers4

Linked