How do I download files with AWS CLI based on a list?

Question

I'm trying to download a subset of files from a public s3 bucket that contains millions of IRS files. I can download the entire repository with the command:

aws s3 sync s3://irs-form-990/ ./

But it takes way too long!

I know I should be using the --include / --exclude flags, but I don't know how to use them with a list of values. I have a csv that contains unique identifiers for all the files from 2017 that I'd like, but how do I use it in with AWS CLI? The list itself is half a million IDs long.

Help much appreciated. Thank you.

Perhaps something like this: https://stackoverflow.com/a/54559662/5354201 or this: https://gist.github.com/rpbaptist/d21276a6d110afbffff67aefc284eabd The idea is to use a script to iterate over the filenames in your list and download them. Not sure what the performance would be vs. using a regular filter and a single command, but it may be an option to investigate. — trademark, Jul 08 '20 at 13:34

score 2 · Accepted Answer · answered Jul 08 '20 at 13:39

2

There is a bash script which can read all the filenames from a file filename.txt. All you have to do is to convert those IDs in filenames.

#!/bin/bash  
set -e  
while read line  
do  
   aws s3 cp s3://bucket-name/$line dest-path/  
done <filename.txt

This question was asked before and the answer you can find it here

answered Jul 08 '20 at 13:39

D A

1,724
1
8
19

This is my first time writing a bash script, and I'm a little confused. What is "$line dest-path/ " suppose to be? Is it suppose to be the location of filename.txt? I also got the error " does not existerror occurred (404) when calling the HeadObject operation: Key "20161239.xml". If that means that particular file doesn't exist, is there a way to skip it and try the next one? – ethan tenison Jul 09 '20 at 16:53
I have answered to your question, and this is the right way to do it. About how to write a bash script, what is a loop, what is a variable or to understand that "dest-path" is the path where your files should be copied, you will need to do some research by yourself. – D A Jul 09 '20 at 18:14
1

Okay well it doesn't work even when I set the dest-path as the place I'd like them copied, so there's no need to be rude. – ethan tenison Jul 10 '20 at 13:57

How do I download files with AWS CLI based on a list?

1 Answers1

Linked