0

Consider an S3 bucket with objects with keys like:

abc_1_epoch1.ext
abc_1_epoch2.ext
abc_2_epoch1.ext
xyz_1_epcoh1.ext

When i group by keys with prefix, then due to epoch it forms a lexicographical order. I want to delete all objects except the one which comes last in lexicographical order. So, the expected output in the bucket after cleanup task is:

abc_1_epoch2.ext
abc_2_epoch1.ext
xyz_1_epoch1.ext

As you can see grouping keys are abc_1, abc_2 and xyz_1. Point to note is that i have multi-million such objects in the bucket and hence I want a scalable solution.

Shubham
  • 63
  • 1
  • 5
  • Fetching a list of millions of objects can be onerous so look at S3 Inventory, assuming you can tolerate results being a few hours old. Retrieve the inventory report, parse it, and produce your list of objects to delete. – jarmod Jan 12 '23 at 18:58
  • Also, consider a lifecycle rule to remove old objects, if a simple object-created check applies to the objects here. – Anon Coward Jan 12 '23 at 19:21
  • On the lifecycle option, see Why should I use object tags? at [Simplify your data lifecycle by using object tags with Amazon S3 Lifecycle](https://aws.amazon.com/blogs/storage/simplify-your-data-lifecycle-by-using-object-tags-with-amazon-s3-lifecycle/). – jarmod Jan 12 '23 at 20:01

0 Answers0