3

One method of ensuring a file in S3 is what it claims to be is to download it, get its checksum, and match the result against the checksum you were expecting.

Does AWS provide any service that allows this to happen without the user needing to first download the file? (i.e. ideally a simple request/url that provides the checksum of an S3 file, so that it can be verified before the file is downloaded)

What I've tried so far

I can think of a DIY solution along the lines of

  • Create an API endpoint that accepts a POST request with the S3 file url
  • Have the API run a lambda that generates the checksum of the file
  • Respond with the checksum value

This may work, but is already a little complicated and would have further considerations, e.g. large files may take a long time to generate a checksum (e.g. > 60 seconds)

I'm hoping AWS have some simple way of validating S3 files?

stevec
  • 41,291
  • 27
  • 223
  • 311

1 Answers1

1

There is an ETag created against each object, which is an MD5 of the object contents.

However, there seems to be some exceptions.

From Common Response Headers - Amazon Simple Storage Service:

ETag: The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. Whether or not it is depends on how the object was created and how it is encrypted as described below:

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-S3 or plaintext, have ETags that are an MD5 digest of their object data.

  • Objects created by the PUT Object, POST Object, or Copy operation, or through the AWS Management Console, and are encrypted by SSE-C or SSE-KMS, have ETags that are not an MD5 digest of their object data.

  • If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption.

Also, the calculation of an ETag for a multi-part upload can be complex. See: s3cmd - What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB? - Stack Overflow

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • Awesome answer. Am I correct to say that any files in S3 that are not either i) encrypted, ii) Multipart Upload, or iii) Part Copy operation **will** have an MD5 hash available (i.e. no exceptions for files that are not encrypted or Multipart Upload / Part Copy operation)? All the files I'll be dealing with are unencrypted (publicly available) files (and extremely unlikely to be multipart upload or part copy), so if I can confirm this, I can do away with those additional complexities. – stevec Jan 21 '20 at 02:37
  • 1
    Yes, under those circumstances, the ETag appears to behave as expected. – John Rotenstein Jan 21 '20 at 02:45