140

I have a use case where I programmatically bring up an EC2 instance, copy an executable file from S3, run it and shut down the instance (done in user-data). I need to get only the last added file from S3.

Is there a way to get the last modified file / object from a S3 bucket using the AWS CLI tool?

informatik01
  • 16,038
  • 10
  • 74
  • 104
wishy
  • 1,748
  • 2
  • 12
  • 14

5 Answers5

296

You can list all the objects in the bucket with aws s3 ls $BUCKET --recursive:

$ aws s3 ls $BUCKET --recursive
2015-05-05 15:36:17          4 an_object.txt
2015-06-08 14:14:44   16322599 some/other/object
2015-04-29 12:09:29      32768 yet-another-object.sh

They're sorted alphabetically by key, but that first column is the last modified time. A quick sort will reorder them by date:

$ aws s3 ls $BUCKET --recursive | sort
2015-04-29 12:09:29      32768 yet-another-object.sh
2015-05-05 15:36:17          4 an_object.txt
2015-06-08 14:14:44   16322599 some/other/object

tail -n 1 selects the last row, and awk '{print $4}' extracts the fourth column (the name of the object).

$ aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'
some/other/object

Last but not least, drop that into aws s3 cp to download the object:

$ KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
$ aws s3 cp s3://$BUCKET/$KEY ./latest-object
David Murray
  • 4,475
  • 1
  • 14
  • 16
  • 9
    Brilliant post. Particularly useful due to the explanations of each command. Thanks. – Christian Feb 02 '17 at 11:51
  • 6
    S3 only indexes objects by key. If the bucket has enough objects that a "full table scan" to find the one you're looking for is impractical, you'll need to build a separate index of your own. The laziest option I can think of is to put the key of the most recently written object in s3://$BUCKET/current after you've written it, and have readers look there to find which one they should pull. – David Murray Mar 28 '17 at 22:22
  • Just a side note, if you want to the same thing for a whole "folder", `awk` will need to select the second element (instead of 4th) and `--recursive` will be needed, e.g., `KEY=\`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $2}'\` ; aws s3 cp s3://$BUCKET/$KEY ./latest-object --recursive` – David Arenburg Apr 24 '17 at 10:24
  • 14
    This won't work on buckets with more than 1000 items, because that is the most that can be returned https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html – nico Sep 03 '19 at 08:29
  • 5
    isn't this going to pose problems on a bucket with a HUGE number of objects? – Scott Aug 10 '20 at 19:34
  • is there a windows batch file version of this available? – tunawolf Sep 29 '20 at 12:06
  • Thanks for such a great explanation. I am getting confused in the last part where the cp command is used. What is the key in the command & where to get it from? – Cloud Wanderer Mar 11 '23 at 11:44
68

Updated answer

After a while there is a small update how to do it a bit elegant:

aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'sort_by(Contents, &LastModified)[-1].Key' --output=text

Instead of extra reverse function we can get last entry from the list via [-1]


Old answer

This command just do the job without any external dependencies:

aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text
informatik01
  • 16,038
  • 10
  • 74
  • 104
Roman Shishkin
  • 2,097
  • 20
  • 21
  • 4
    Excellent. If you also need the object name to match a certain string: `--query 'reverse(sort_by(Contents[?contains(Key, \`myKey\`)], &LastModified))[:1].Key'` – bfcapell Oct 27 '19 at 18:39
  • 20
    --query is executed locally, so if you have more than 1000 files in the bucket you are not guaranteed to get the last modified ones first. – Gismo Ranas Mar 13 '20 at 10:45
  • 1
    @GismoRanas Good point. The regular `--filter` option can be applied to reduce a list – Roman Shishkin Jul 23 '20 at 10:56
  • 2
    This one works in Windows cmd if you wrap the query in double quotes rather than single. – Max xaM Mar 15 '21 at 19:09
  • 1
    For large buckets it is recommended to use a hierarchical key naming scheme so you can take advantage of the `--prefix` option in order to reduce the searched key list. – Yuri Oct 04 '21 at 11:23
  • this should be the accepted answer – 333kenshin Mar 29 '22 at 02:58
11
aws s3api list-objects-v2 --bucket "bucket-name" |jq  -c ".[] | max_by(.LastModified)|.Key"
FelixSFD
  • 6,052
  • 10
  • 43
  • 117
AlexLoo
  • 111
  • 1
  • 2
  • 1
    If you've never met jq before, it's a json processor https://stedolan.github.io/jq/ – andrew lorien Apr 16 '18 at 08:44
  • 3
    I think `list-objects-v2` has a limit on max items, so if your bucket has more objects than that - this might not get an accurate answer – Gilad Peleg May 13 '18 at 08:55
  • https://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects-v2.html states (as of this writing) that the maximum limit _per page_ is 1000. Also note that the output has `IsTruncated` set to true if more keys are available to return. – Ashutosh Jindal Dec 05 '19 at 16:08
0

If this is a freshly uploaded file, you can use Lambda to execute a piece of code on the new S3 object.

If you really need to get the most recent one, you can name you files with the date first, sort by name, and take the first object.

Jonathan Turpie
  • 1,343
  • 10
  • 16
  • 2
    This is unfortunately not a freshly uploaded file. I will need the last uploaded file which could have been uploaded at anytime. – wishy Jun 26 '15 at 00:30
-2

Following is bash script, that downloads latest file from a S3 Bucket. I used AWS S3 Synch command instead, so that it would not download the file from S3 if already existing.

--exclude, excludes all the files

--include, includes all the files matching the pattern

#!/usr/bin/env bash

    BUCKET="s3://my-s3-bucket-eu-west-1/list/"
    FILE_NAME=`aws s3 ls $BUCKET  | sort | tail -n 1 | awk '{print $4}'`
    TARGET_FILE_PATH=target/datdump/
    TARGET_FILE=${TARGET_FILE_PATH}localData.json.gz

    echo $FILE_NAME
    echo $TARGET_FILE

    aws s3 sync $BUCKET $TARGET_FILE_PATH --exclude "*" --include "*$FILE_NAME*"

    cp target/datdump/$FILE_NAME $TARGET_FILE

p.s. Thanks @David Murray

AjitChahal
  • 221
  • 2
  • 7
  • 1
    Very inefficient. My folder is 29T large. Sorting should be done on the server side, not by downloading all of the files and piping into sort. – Banoona Jul 25 '22 at 09:22