0

My client has 60+ buckets, we pay hundreds of dollars per months to store this data and we don't know how to easily distinguish what is useful from what is legacy.

Clicking on each bucket and finding what is taking space is tedious.

Is there a way to first list all files from all buckets and find what is taking the most space, so we can clear what is old and big?

Mathieu J.
  • 1,932
  • 19
  • 29
  • 2
    See [Amazon S3 Inventory](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html). Also, depending on the number of objects in your buckets, an [awscli query](https://stackoverflow.com/questions/53745277/aws-find-max-file-size-in-s3-bucket) might suffice. – jarmod Oct 12 '21 at 12:54

1 Answers1

0

Initially, I did not find any easy way. So I have been using the Bash script below to build a list of files and calculate the total size of each bucket. This script is still useful because I have another script that can parse it further.

Note that if you have buckets above 500GB with tons of files, the command aws s3api list-object-versions can run overnight and consume more than 15GB of RAM. I don't know if it could complete, I don't have 32GB, so I had to stop it. Otherwise, it's fast.

But if you are only looking for your biggest bucket, the easiest with within the AWS S3 console directly.

On the left navigation, under Storage Lens, click on Dashboards. You'll see your S3 dashboards, you might have one by default like me, called default-account-dashboard if you don't you need to create one.

Open this dashboard. Scroll to the bottom, you see your biggest regions and buckets.

AWS S3 biggest regions and buckets

Bash script

# list buckets for current account
aws s3api list-buckets > s3-list-buckets
# store clean list of buckets in env var
s3_buckets=$(grep \"Name s3-list-buckets  | cut -c 22- | rev | cut -c 3- | rev)
# list all objects in all buckets
for bucket in $s3_buckets
do
   echo "begin $bucket"
   [ -e all_files_bucket_$bucket ] && continue
   aws s3api list-object-versions --bucket $bucket > all_files_bucket_$bucket
   echo "completed download list for $bucket"
done
# display sum of all files per bucket, sorted by biggest bucket last
for bucket in $s3_buckets; do echo "$(grep "Size.:."  all_files_bucket_$bucket | awk -F : '{print $2}' | awk -F , {'print $1'} | paste -sd+ | bc) bytes in $bucket" ; done | sort -n
Mathieu J.
  • 1,932
  • 19
  • 29