60

I am having trouble downloading multiple files from AWS S3 buckets to my local machine.

I have all the filenames that I want to download and I do not want others. How can I do that ? Is there any kind of loop in aws-cli I can do some iteration ?

There are couple hundreds files I need to download so that it seems not possible to use one single command that takes all filenames as arguments.

DQI
  • 725
  • 1
  • 5
  • 7
  • you can look at `aws s3api get-object` if you're able to filter/query the list of your files .. if you have the list in a file, you can read the file by line and pipe with `aws s3 cp s3://yourbuyet/-` – Frederic Henri Jun 24 '16 at 20:46
  • Does this answer your question? [how to include and copy files that are in current directory to s3 (and not recursively)](https://stackoverflow.com/questions/21711300/how-to-include-and-copy-files-that-are-in-current-directory-to-s3-and-not-recur) – Channa Aug 24 '20 at 17:09
  • @FredericHenri could you elaborate on how to read a file in this case? – TechNewbie Sep 15 '22 at 21:38

8 Answers8

77

Also one can use the --recursive option, as described in the documentation for cp command. It will copy all objects under a specified prefix recursively.

Example:

aws s3 cp s3://folder1/folder2/folder3 . --recursive

will grab all files under folder1/folder2/folder3 and copy them to local directory.

Eldad Assis
  • 10,464
  • 11
  • 52
  • 78
siphiuel
  • 3,480
  • 4
  • 31
  • 34
  • 3
    Powerful, but an (obvious?) warning to use with care. AWS charges for every in/out file transfer. And when combined with the `rm` command, check your syntax to avoid accidental deletion! – AlainD Sep 19 '19 at 09:34
  • Is there any specific order on how the files will be copied? As in files having an ending number (file_1, file_2, ...) will they be copied in order and nothing can be said about it? – VictorHMartin Dec 13 '21 at 10:54
  • 1
    this doesn't answer the question, which wants to only copy specific files, by name/filepath. – Hugh Perkins May 31 '22 at 21:15
35

You might want to use "sync" instead of "cp". The following will download/sync only the files with the ".txt" extension in your local folder:

aws s3 sync --exclude="*" --include="*.txt" s3://mybucket/mysubbucket .
f.cipriani
  • 3,357
  • 2
  • 26
  • 22
  • 6
    I would like to use the above command to copy just 100 files (for example). Is there a clever way/parameter that can be used to do this? – Paul Pritchard Feb 19 '18 at 15:13
33

There is a bash script which can read all the filenames from a file filename.txt.

#!/bin/bash  
set -e  
while read line  
do  
  aws s3 cp s3://bucket-name/$line dest-path/  
done <filename.txt
Andrea Bergonzo
  • 3,983
  • 4
  • 19
  • 31
Rajan
  • 392
  • 2
  • 5
  • Thanks, this is definitely better way to do that. – DQI Jun 25 '16 at 18:37
  • Yes, this is much better way to do it compared to all other answers posted here. Thank you for sharing this answer. – Shabbir Bata May 29 '18 at 14:39
  • 2
    `set -e stops the execution of a script if a command or pipeline has an error - which is the opposite of the default shell behaviour, which is to ignore errors in scripts` – Mr_and_Mrs_D Dec 05 '18 at 03:11
  • 5
    This will download files one after the other - need a way to do it in parallel – Mr_and_Mrs_D Dec 05 '18 at 03:18
  • 4
    its too slow,may be its taking time locating those files.is there anyway by which I can send multiple file request at the same time and those files located and download in parallel ? – user3085459 Sep 09 '19 at 09:45
  • Yea, this is too slow. What I really want is to use the --include flag, but match a list of a million files. – Luke Kurlandski Jun 01 '23 at 15:37
30

As per the doc you can use include and exclude filters with s3 cp as well. So you can do something like this:

aws s3 cp s3://bucket/folder/ . --recursive --exclude="*" --include="2017-12-20*"

Make sure you get the order of exclude and include filters right as that could change the whole meaning.

Milo
  • 3,365
  • 9
  • 30
  • 44
Chinmay B
  • 427
  • 4
  • 7
  • 2
    Your last line _"Make sure you get the order of exclude and include filters right as that could change the whole meaning."_ is especially helpful (I had them reversed). Thanks. – newfie_coder Jun 21 '18 at 15:46
6

Tried all the above. Not much joy. Finally, adapted @rajan's reply into a one-liner:

for file in whatever*.txt; do { aws s3 cp $file s3://somewhere/in/my/bucket/; } done
Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
5

I wanted to read s3 object keys from a text file and download them to my machine parallelly.

I used this command

cat <filename>.txt | parallel aws s3 cp {} <output_dir>

The contents of my text file looked like this:

s3://bucket-name/file1.wav
s3://bucket-name/file2.wav
s3://bucket-name/file3.wav

Please make sure you don't have an empty line at the end of your text file. You can learn more about GNU parallel here

roronoa
  • 407
  • 9
  • 14
1

@Rajan's answer is a very good one, however it fails when there is no match found in the *.txt file and the source s3 bucket, however below code resolves also this issue:

#!/bin/bash
while IFS= read -r line; do
aws s3 cp s3://your-s3-source-bucket/folder/$line s3://your-s3-destination/folder/
done <try.txt

The only thing you need is to run the bash file inside you aws notebook.

!chmod +x YOUR-BASH-NAME.sh
!./YOUR-BASH-NAME.sh
Sheykhmousa
  • 139
  • 9
-4

I got the problem solved, may be a little bit stupid, but it works.

Using python, I write multiple line of AWS download commands on one single .sh file, then I execute it on the terminal.

DQI
  • 725
  • 1
  • 5
  • 7
  • 1
    you have plenty of ready to use sdk's [here](http://aws.amazon.com/code) in the amazon website. – Evhz Jun 27 '16 at 07:00