Copy multiple files from s3 bucket

Question

I am having trouble downloading multiple files from AWS S3 buckets to my local machine.

I have all the filenames that I want to download and I do not want others. How can I do that ? Is there any kind of loop in aws-cli I can do some iteration ?

There are couple hundreds files I need to download so that it seems not possible to use one single command that takes all filenames as arguments.

you can look at `aws s3api get-object` if you're able to filter/query the list of your files .. if you have the list in a file, you can read the file by line and pipe with `aws s3 cp s3://yourbuyet/-` — Frederic Henri, Jun 24 '16 at 20:46
Does this answer your question? [how to include and copy files that are in current directory to s3 (and not recursively)](https://stackoverflow.com/questions/21711300/how-to-include-and-copy-files-that-are-in-current-directory-to-s3-and-not-recur) — Channa, Aug 24 '20 at 17:09
@FredericHenri could you elaborate on how to read a file in this case? — TechNewbie, Sep 15 '22 at 21:38

score 77 · Answer 1 · edited Mar 12 '20 at 06:44

77

Also one can use the --recursive option, as described in the documentation for cp command. It will copy all objects under a specified prefix recursively.

Example:

aws s3 cp s3://folder1/folder2/folder3 . --recursive

will grab all files under folder1/folder2/folder3 and copy them to local directory.

edited Mar 12 '20 at 06:44

Eldad Assis

10,464
11
52
78

answered Aug 29 '17 at 09:53

siphiuel

3,480
4
31
34

3

Powerful, but an (obvious?) warning to use with care. AWS charges for every in/out file transfer. And when combined with the `rm` command, check your syntax to avoid accidental deletion! – AlainD Sep 19 '19 at 09:34
Is there any specific order on how the files will be copied? As in files having an ending number (file_1, file_2, ...) will they be copied in order and nothing can be said about it? – VictorHMartin Dec 13 '21 at 10:54
1

this doesn't answer the question, which wants to only copy specific files, by name/filepath. – Hugh Perkins May 31 '22 at 21:15

score 35 · Answer 2 · answered Mar 28 '17 at 12:24

35

You might want to use "sync" instead of "cp". The following will download/sync only the files with the ".txt" extension in your local folder:

aws s3 sync --exclude="*" --include="*.txt" s3://mybucket/mysubbucket .

answered Mar 28 '17 at 12:24

f.cipriani

3,357
2
26
22

6

I would like to use the above command to copy just 100 files (for example). Is there a clever way/parameter that can be used to do this? – Paul Pritchard Feb 19 '18 at 15:13

score 33 · Accepted Answer · edited Jul 07 '17 at 17:57

33

There is a bash script which can read all the filenames from a file filename.txt.

#!/bin/bash  
set -e  
while read line  
do  
  aws s3 cp s3://bucket-name/$line dest-path/  
done <filename.txt

edited Jul 07 '17 at 17:57

Andrea Bergonzo

3,983
4
19
31

answered Jun 25 '16 at 07:30

Rajan

392
2
5

Thanks, this is definitely better way to do that. – DQI Jun 25 '16 at 18:37
Yes, this is much better way to do it compared to all other answers posted here. Thank you for sharing this answer. – Shabbir Bata May 29 '18 at 14:39
2

`set -e stops the execution of a script if a command or pipeline has an error - which is the opposite of the default shell behaviour, which is to ignore errors in scripts` – Mr_and_Mrs_D Dec 05 '18 at 03:11
5

This will download files one after the other - need a way to do it in parallel – Mr_and_Mrs_D Dec 05 '18 at 03:18
4

its too slow,may be its taking time locating those files.is there anyway by which I can send multiple file request at the same time and those files located and download in parallel ? – user3085459 Sep 09 '19 at 09:45
Yea, this is too slow. What I really want is to use the --include flag, but match a list of a million files. – Luke Kurlandski Jun 01 '23 at 15:37

score 30 · Answer 4 · edited Dec 20 '17 at 22:10

30

As per the doc you can use include and exclude filters with s3 cp as well. So you can do something like this:

aws s3 cp s3://bucket/folder/ . --recursive --exclude="*" --include="2017-12-20*"

Make sure you get the order of exclude and include filters right as that could change the whole meaning.

edited Dec 20 '17 at 22:10

Milo

3,365
9
30
44

answered Dec 20 '17 at 17:06

Chinmay B

427
4
7

2

Your last line _"Make sure you get the order of exclude and include filters right as that could change the whole meaning."_ is especially helpful (I had them reversed). Thanks. – newfie_coder Jun 21 '18 at 15:46

score 6 · Answer 5 · answered Mar 04 '20 at 19:36

6

Tried all the above. Not much joy. Finally, adapted @rajan's reply into a one-liner:

for file in whatever*.txt; do { aws s3 cp $file s3://somewhere/in/my/bucket/; } done

answered Mar 04 '20 at 19:36

Hugh Perkins

7,975
7
63
71

score 5 · Answer 6 · answered Sep 01 '21 at 18:10

I wanted to read s3 object keys from a text file and download them to my machine parallelly.

I used this command

cat <filename>.txt | parallel aws s3 cp {} <output_dir>

The contents of my text file looked like this:

s3://bucket-name/file1.wav
s3://bucket-name/file2.wav
s3://bucket-name/file3.wav

Please make sure you don't have an empty line at the end of your text file. You can learn more about GNU parallel here

score 1 · Answer 7 · answered Nov 24 '21 at 10:19

@Rajan's answer is a very good one, however it fails when there is no match found in the *.txt file and the source s3 bucket, however below code resolves also this issue:

#!/bin/bash
while IFS= read -r line; do
aws s3 cp s3://your-s3-source-bucket/folder/$line s3://your-s3-destination/folder/
done <try.txt

The only thing you need is to run the bash file inside you aws notebook.

!chmod +x YOUR-BASH-NAME.sh
!./YOUR-BASH-NAME.sh

score -4 · Answer 8 · answered Jun 24 '16 at 21:18

-4

I got the problem solved, may be a little bit stupid, but it works.

Using python, I write multiple line of AWS download commands on one single .sh file, then I execute it on the terminal.

answered Jun 24 '16 at 21:18

DQI

725
1
5
7

1

you have plenty of ready to use sdk's [here](http://aws.amazon.com/code) in the amazon website. – Evhz Jun 27 '16 at 07:00

Copy multiple files from s3 bucket

8 Answers8

Linked