0

I was looking into writing an MD5sum to the object after it has been uploaded to AWS S3 and doing a data integrity check as described here:

https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/

For example, you uploaded 100,000 objects to an AWS S3 bucket and you want to run the MD5 to do an data integrity check, is there an additional cost for doing this? Does it add to the count of requests for a PUT, LIST, etc?

Edward_178118
  • 955
  • 4
  • 15
  • 33

1 Answers1

2

Background - Adding Metadata

You can only set metadata when the object is being uploaded (Amazon link, see copied text below). If you want to add metadata to an existing object you have to make a copy of the object, deleting the old version.

Each Amazon S3 object has data, a key, and metadata. The object key (or key name) uniquely identifies the object in a bucket. Object metadata is a set of name-value pairs. You can set object metadata at the time you upload it. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.

Because setting metadata is part of the PUT request there is no additional charge for this, as they are charged per request rather than by data volume.

AWS supported method for validating uploaded Data

AWS supports MD5 validation of data uploaded to S3, which is described here, as well as in the S3API PUT documentation. In short you:

  • Calculate the MD5 locally
  • Include the MD5 in the upload request, which AWS checks for you
  • You can optionally include the MD5 as metadata in your upload

If the object fails the MD5 checksum the response from S3 includes an error.

Costs for AWS MD5 Validation

The S3 pricing page does not mention any costs for MD5 validation, so the only answer I can give you is "no it's not charged for".

Tim
  • 31,888
  • 7
  • 52
  • 78
  • So I take it this applies to aws sync too, where you can't update the object with new data and update the md5sum in the objection with a sync option, it must be a put and with a new object? – Edward_178118 Jan 13 '20 at 22:17
  • My reading suggests that metadata can only be associated with an object during the PUT operation, which means when uploading an object. Sync is a command line program, rather than an API call, I expect it uses a combination of list, get, and put calls to provide the sync feature. – Tim Jan 13 '20 at 22:42