1

In the boto/s3 module there is function called set_contents_from_filename which seems to take an md5 hash as a parameter.

But the documentation is not clear on when/how to calculate the hash. Could someone help me with this? Also, how could I save the hash information to a file?

eikonomega
  • 1,971
  • 17
  • 28
rgm
  • 1,241
  • 2
  • 16
  • 33

3 Answers3

3

The set_contents_from_filename method will automatically calculate the MD5 checksum for you. There is an optional md5 parameter to this method which allows you to pass in the MD5 if you have already calculated it for some reason in your application but if you don't pass a value in, boto will calculate it for you.

garnaat
  • 44,310
  • 7
  • 123
  • 103
  • when i law the log file, this is what i have " DEBUG: Headers: {'x-amz-meta-host_name': 'xp-vm', 'Content-Length': '35250', 'Expect': '100-Continue', 'Content-MD5': 'dzci3KDIAPWMdzWDaBaaJg==', 'Content-Type': 'application/octet-stream', 'User-Agent': 'Boto/2.6.0-dev (win32)'}" . Is content-md5 the md5 value?. for the same file in when i do md5 in console my value is "dde831f630d056bb79d7d236f52135ff" – rgm Jan 18 '13 at 13:28
  • That is the MD5 as calculated by boto. By sending the MD5 in the request, it means that S3 can calculate the MD5 of what it receives and, if they don't match, S3 will return an error indicating that something was corrupted in the process. This is an important integrity check. I don't know why the MD5 seems to be different in the console. The only reason can be that the content itself is different for some reason. I'm confident that boto's MD5 calculation is correct. – garnaat Jan 18 '13 at 14:35
3

As @garnatt already said, the set_contents_from_filename method will automatically calculate the MD5 checksum for you.

If you look at the docs, there is a method called compute_md5 which returns a tuple containing the MD5 checksum in a hexdigest (what your getting in the console using md5sum) and also base64 encoded which it sends to Amazon which is what your seeing in the headers.

The md5 parameter in the set_contents_from_filename method takes the MD5 checksum in a tuple format, the same way compute_md5 returns. If you need to calculate it manually, the best way is to use the compute_md5 method. Otherwise you have to build a tuple in the correct format before passing it to the md5 parameter.

mouckatron
  • 1,289
  • 2
  • 13
  • 23
1

The MD5 calculate by boto is the base 64 encode of checksum. The 'Content-MD5' in header for a given file to be uploaded/already uploaded can be calculated by:

import hashlib, base64
conn = S3Connection(access_key, secret_key)
bucket = conn.get_bucket('bucket_name')
#If you want to calculate MD5 of a file already uploaded
obj_key = bucket.get_key('file_name_in_s3')
content = obj_key.get_contents_as_string()
m = hashlib.md5()
m.update(content)
value = m.digest()
remote_md5 = base64.b64encode(value)

#To calculate md5 of a file to be uploaded to S3
cur_md5 = base64.b64encode(hashlib.md5(open('Local/Path/To/File').read()).digest())
kk1957
  • 8,246
  • 10
  • 41
  • 63