I am trying to download only specific files from AWS. I have the list of file URLs. Using the CLI I can only download all files in a bucket using the --recursive command, but I only want to download the files in my list. Any ideas on how to do that?
Asked
Active
Viewed 1.2k times
3 Answers
6
This is possibly a duplicate of: Selective file download in AWS S3 CLI
You can do something along the lines of:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2018-02-06*" --recursive

RyanWilliamWest
- 106
- 1
- 7
-
1Thank you Ryan. However, the user in that case has a common characteristic to all his files, they all had the same date. My files don't have that, they are simply a subset of the files in the bucket in S3 and I have their URLs. – Sara Feb 06 '19 at 17:30
-
2You don't have to use the * attribute, you can specify the exact urls. and then use multiple --includes in your single AWS Statement. I would wrap it in a Shell or python script using boto3 to open your list, and create a single AWS S3 CP command and execute that using the --include for each url. – RyanWilliamWest Feb 06 '19 at 17:40
-
@Sara You could use multiple `--include` parameters, specifying one per file to download. However, I don't think it would necessarily be any faster than grabbing one at a time because this functionality requires the AWS CLI to first scan for existing files (to fit the wildcard capabilities), whereas copying a specific file would only need one API call. – John Rotenstein Feb 06 '19 at 23:50
-
Thank you! I ended up using boto3 and looping over my list. The multiple --include was slower than that. Looping over the list using aws s3 cp was also slower. – Sara Feb 07 '19 at 20:53
1
Since you have the s3 urls already in a file (say file.list
), like -
s3://bucket/file1
s3://bucket/file2
You could download all the files to your current working directory with a simple bash script -
while read -r line;do aws s3 cp "$line" .;done < test.list

abiydv
- 571
- 2
- 8
-
7This works and it's what I have been using so far. However, this means that the aws command, which establishes a connection etc, has to be executed for every line, which is slow. I was wondering if there was a quicker way to go about it – Sara Feb 06 '19 at 18:55
-2
People, I found out a quicker way to do it: https://stackoverflow.com/a/69018735
WARNING: "Please make sure you don't have an empty line at the end of your text file".
It worked here! :-)

Fernando
- 1
-
This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/31429940) – Muhammad Mohsin Khan Apr 04 '22 at 12:42