5

I am trying to download only specific files from AWS. I have the list of file URLs. Using the CLI I can only download all files in a bucket using the --recursive command, but I only want to download the files in my list. Any ideas on how to do that?

Sara
  • 61
  • 1
  • 1
  • 2

3 Answers3

6

This is possibly a duplicate of: Selective file download in AWS S3 CLI

You can do something along the lines of:

aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2018-02-06*" --recursive

https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html

RyanWilliamWest
  • 106
  • 1
  • 7
  • 1
    Thank you Ryan. However, the user in that case has a common characteristic to all his files, they all had the same date. My files don't have that, they are simply a subset of the files in the bucket in S3 and I have their URLs. – Sara Feb 06 '19 at 17:30
  • 2
    You don't have to use the * attribute, you can specify the exact urls. and then use multiple --includes in your single AWS Statement. I would wrap it in a Shell or python script using boto3 to open your list, and create a single AWS S3 CP command and execute that using the --include for each url. – RyanWilliamWest Feb 06 '19 at 17:40
  • @Sara You could use multiple `--include` parameters, specifying one per file to download. However, I don't think it would necessarily be any faster than grabbing one at a time because this functionality requires the AWS CLI to first scan for existing files (to fit the wildcard capabilities), whereas copying a specific file would only need one API call. – John Rotenstein Feb 06 '19 at 23:50
  • Thank you! I ended up using boto3 and looping over my list. The multiple --include was slower than that. Looping over the list using aws s3 cp was also slower. – Sara Feb 07 '19 at 20:53
1

Since you have the s3 urls already in a file (say file.list), like -

s3://bucket/file1
s3://bucket/file2

You could download all the files to your current working directory with a simple bash script -

while read -r line;do aws s3 cp "$line" .;done < test.list
abiydv
  • 571
  • 2
  • 8
  • 7
    This works and it's what I have been using so far. However, this means that the aws command, which establishes a connection etc, has to be executed for every line, which is slow. I was wondering if there was a quicker way to go about it – Sara Feb 06 '19 at 18:55
-2

People, I found out a quicker way to do it: https://stackoverflow.com/a/69018735

WARNING: "Please make sure you don't have an empty line at the end of your text file".

It worked here! :-)

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/31429940) – Muhammad Mohsin Khan Apr 04 '22 at 12:42