0

Does Amazon provide an easy way extract a list of all folders that have files greater than 500 MB from a s3 bucket? want to limit the scope to the '/files/ftp_upload/' directories also This is so I can calculate my costs, etc.

I had tried this but doesn't get so much help

aws s3 ls s3://YOUR_BUCKET/YOUR_FOLDER/ --recursive --human-readable --summarize

what is the best approach here ?

sam23
  • 49
  • 1
  • 5

1 Answers1

0

S3 does not have a concept of "folders", the console only presents the data like folders in the console by splitting object keys on the forward slash ("/"). So summarizing data by "folder" would require parsing the key of each object.

You could pull ALL of the data for objects that match your key prefix and then perform some logic & math to sum it together, but that is a lot of work.

If you don't need the data in real-time, S3 Inventory may provide a good solution. Basically you get a list of the objects in the specified S3 bucket output in one of three formats: CSV, Apache ORC or Apache Parquet. You can then do much easier computations based on the data.

The downside to S3 Inventory is that it takes a day or so to get the report and it is not real-time.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html

-- If you are looking at an easier way to manage costs you may want to break the use case up using one S3 bucket for each. Then you could utilize Cost Allocation Tags at the Bucket level.

Also S3 utilization reporting is helpful for determining if you should be using a different storage class.

Tim P
  • 146
  • 5
  • I just want to filter out the files which are greater than 500 MB from my bucket upload folder ? is there any advice on that? – sam23 May 04 '22 at 06:29
  • Belated, but for anyone else reading this: you can add the `Size` optional field to your S3 inventory, and in your inventory parsing script, check the value of that field. – Illya Moskvin Aug 24 '23 at 19:00