1

I am interested in traffic lifecycle (i.e. when the objects were created and deleted) of objects. One approach is to perform periodic scan of the bucket and track explicitly the lastModifiedTime and perform a diff with previous scan result to identify objects deleted.

Another alternate I was considering was to enable S3 event notifications. However, the data in notification does not contain lastModifiedTime for the object. Can the eventTime be used as proxy instead? Is there a guarantee how quickly the event is sent ? In my case, it is acceptable if delivery of the event is delayed; as long as eventTime is not significantly later that modificationTime of object

Also, any other alternatives to capture lifecycle of s3 objects?

alwaysAStudent
  • 2,110
  • 4
  • 24
  • 47

2 Answers2

5

Yeah, the eventTime is a pretty good approximation of the lastModifiedTime of an object. One caveat here is the definition of lastModifiedTime is

Object creation date or the last modified date, whichever is the latest.

So in order to use eventTime as an approximation, you probably need a trigger that covers all the events where an object is either created or modified. Regarding to your question of how quickly the event is sent, here is a quote from the S3 documentation:

Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

If you want the accurate lastModifiedTime, you need to do a headObject operation for each object.

Your first periodic pull approach could work, but be careful don't do it naively if you have millions of objects. I mean don't use listObjects and do it in a while loop. This doesn't scale at all and listObjects API is pretty expensive. If you only need to do this traffic analysis once a day or once a week, I recommend using S3 inventory. The lastModifiedTime is included in the inventory report. [ref]

jellycsc
  • 10,904
  • 2
  • 15
  • 32
3

There is no guarantee for how long it takes to deliver the events. From the docs:

Amazon S3 event notifications are designed to be delivered at least once. Typically, event notifications are delivered in seconds but can sometimes take a minute or longer.

Also events occurring at the same time, may be represented by single event at the end:

If two writes are made to a single non-versioned object at the same time, it is possible that only a single event notification will be sent. If you want to ensure that an event notification is sent for every successful write, you can enable versioning on your bucket. With versioning, every successful write will create a new version of your object and will also send an event notification.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • 1
    Is the `eventTime` in the event the delivery time? or is the time the event was generated? I though it was the latter... I don't mind a delay in deliver as long as `eventTime` is guaranteed to be within few seconds of `lastModifiedTime` of the object – alwaysAStudent May 15 '20 at 22:25
  • @learningMyWayThru I see what you mean. I think jellycsc addressed that in his answer well. – Marcin May 15 '20 at 22:28