Delete files from s3 bucket based on names listed in file using cli

Question

I'm trying to delete multiple (like: thousands) of files from Amazon S3 bucket. I have a file names listed in a file like so:

name1.jpg
name2.jpg
...
name2020201.jpg

I tried following solution:

aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.*"

from this question but --include only takes one arg. I tried to get hacky and list names like --include "name1.jpg" but this does not work either.

This approach does not work as well:

aws s3 rm s3://test-bucket < file.txt

Can you help?

did you tried this `aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.jpg" ` — Jatin Mehrotra, Apr 26 '21 at 15:21
Yes, I actually did it without "=" sign, which is correct and I will fix it. But that example does not solve the problem in question, the answer does — magdazelena, Apr 27 '21 at 06:52

score 1 · Answer 1 · answered Apr 26 '21 at 14:53

1

I figured this out with this simple bash script:

#!/bin/bash  
set -e  
while read line  
do  
   aws s3 rm s3://test-bucket/$line
done <files.txt

Inspired by this answer Answer is: delete one at a time!

answered Apr 26 '21 at 14:53

magdazelena

51
5

score 0 · Accepted Answer · answered Apr 28 '21 at 09:55

The following approach is actually much faster since my first answer took ages to complete.

My first approach was to delete one line at a time using rm command. This is not efficient. After around 15h (!) it deleted only around 40.000 records, which was 1/5 of total.

This approach by Norbert Preining is waaay faster. As he explains, it uses s3api method called delete-objects which can bulk delete objects in storage. This method takes a json object as an argument. To parse list of file names into JSON object required, this script uses JSON preprocessor called jq (read more here). The script takes 500 records per iteration.

cat file-with-names |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s\n' "${ary[@]}" | ./jq-win64.exe -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUKET --delete "$objdef"
done

Delete files from s3 bucket based on names listed in file using cli

2 Answers2