I'm trying to download a subset of files from a public s3 bucket that contains millions of IRS files. I can download the entire repository with the command:
aws s3 sync s3://irs-form-990/ ./
But it takes way too long!
I know I should be using the --include / --exclude flags, but I don't know how to use them with a list of values. I have a csv that contains unique identifiers for all the files from 2017 that I'd like, but how do I use it in with AWS CLI? The list itself is half a million IDs long.
Help much appreciated. Thank you.