12

The AWS S3 docs state that:

Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.

http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel

The timespan until full consistency is reached can vary. During this period GET requests may return the previous object or the udpated object.

My question is:

When is the last-modified timestamp updated? Is it updated immediately after the overwrite PUT succeeds but before full consistency is reached, or is it only updated after full consistency is achieved?

I suspect the former but I can't find any documentation which clearly states this.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
andrasp
  • 168
  • 1
  • 1
  • 7
  • @JohnRotenstein I appreciate the well-intentioned quote edit, but only that first sentence is from the AWS docs. The sentence starting 'The timespan until...' is my own writing. Please revert so that it doesn't mislead people. Thank you. – andrasp Nov 20 '16 at 01:23

2 Answers2

13

The Last-Modified timestamp should match the Date value returned in the response headers from the successful PUT request.

To my knowledge, this is not explicitly documented, but it can be derived from what is documented.

When you overwrite an object, it's not the overwriting itself that may be delayed by the eventual consistency model -- it's the availability of the overwritten content at a given S3 node (S3 is replicated to multiple nodes within the S3 region).

But note that this answer was written in 2016, and in 2020, S3 announced that eventual consistency should no longer be a concern:

Effective immediately, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions, and is available to you at no extra charge! There’s no impact on performance, you can update an object hundreds of times per second if you’d like, and there are no global dependencies.

https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/

The Last-Modified timestamp, like the rest of the metadata, is established at the time of object creation and immutable, thereafter.

It is, in fact, not the "modification" time of the object at all, it is the creation time of the object. The explanation may sound pedantic, but it is accurate in the strictest sense: S3 objects and their metadata cannot in fact be modified at all, they can only be overwritten. When you "overwrite" an object in S3, what you are actually doing is creating a new object, reusing the old object's key (path+file name).

The official documentation is using very casual terminology, here:

The object creation date or the last modified date, whichever is the latest.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html

That's just not correct in a literal sense, because objects themselves cannot be modified -- even "editing" object metadata creates an entirely new copy of the object with the new metadata. The content associated with a specific object key can be "modified" -- by overwriting the object -- and that's what they're actually speaking of, here.

Theoretically (writing now in 2023), replication delays are effectively a thing of the past, but then as now, Last-Modified would not have been impacted.

The availability of this new object at a given S3 node (replication) is what may be delayed by the eventual consistency model... not the actual creation of the new object that overwrites the old one... hence there would be no reason for Last-Modified to be impacted by a replication delay (assuming there is a replication delay -- eventual consistency can at times be indistinguishable from immediate consistency).

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Fantastic write-up. Especially loved the phrase, "To my knowledge, this is not explicitly documented, but it can be derived from what is documented." Spent a decent amount of my day reading up on S3 guarantees and your post was one of several highlights. – John Zabroski Aug 28 '20 at 16:17
  • I'm confused, is this out of date? https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html explicitly says it's either creation time or modification time, *whichever is latest* – Joseph Garvin Jun 09 '23 at 01:02
  • 1
    @JosephGarvin no, they're just using words in a very casual sense. Objects in S3 are atomic and immutable and can't actually be modified, they can only be deleted or overwritten. The date the object was most recently created/overwritten/replaced is what's returned as `Last-Modified`. – Michael - sqlbot Jun 09 '23 at 17:53
0

This is something S3 does that is absolutely terrible.

Basically in Linux you have the mtime which is the time the file was last modified on the filesystem. Any S3 client could gather the mtime and set the Last-Modified time on S3 so that it would maintain when things were actually last modified.

Instead, Amazon just does this based on the object creation and this is effectively a massive problem if you ever just want to use the data as data outside of the original application that put it there.

So if you download a file from S3, your client would likely set the modified time and if it was uploaded to s3 immediately as it was created then you would at least have a near correct timestamp. But the reality is that you might take a picture and it might not get from your phone through the app, through the stack and to S3 for days!

This is not even considering re-uploading the file to s3. Which would compound the problem, as you might re-upload it years later. S3 will just act like Last-Modified is years later when the file was not actually modified.

They really need to allow you to set it, but they remain ambiguous and over-documented in other areas to make this hard to figure out.

https://github.com/s3tools/s3cmd/issues/524