I tried using this post to look for the last modified file then awk for the folder it's contained in: Get last modified object from S3 using AWS CLI
But this isn't ideal for over 1000 folders and by documentation, should be failing. I have 2000+ folder objects I need to search through. My desired folder will always begin with an D and be followed by a set of incrementing numbers. Ex: D1200
The results from the answer led me to creating this call which works:
aws s3 ls main.test.staging/General_Testing/Results/ --recursive | sort | tail -n 1 | awk '{print $4}'
but it takes over 40 secs to search through thousands of files and I then need to regex parse the output to find the folder object and not the last file modified within it. Also, if I try to do this to find my desired folder (which is the object right after the Results
object):
aws ls s3 main.test.staging/General_Testing/Results/ | sort | tail -1
Then my output will be D998
because the sort function will order folder names like this:
D119
D12
D13
Because technically D12
is greater than D119
because it has a 2
in the 2nd position. Following this strange logic, there's no way I can use that call to reliable retrieve the highest numbered folder and therefore the last one created. Something to note is that folder objects that contain files do not have a Last Modified
tag that one can use to query.
To be clear of my question: What call can I use to look through a large amount of S3 objects to find the largest numbered folder object? Preferably the answer is fast, can work with 1000+ objects, and won't require a regex breakdown.