1

I wanted to use sha1 alghoritm to calculate the checksum of some data, the thing is that in python hashlib input is given as string.

Is it possible to calculate sha1 in python, but somehow give raw bytes as input?

I am asking because if I would want to calculate hash of an file, in C I would use openssl library and just pass normal bytes, but in Python I need to pass string, so if I would calculate hash of some specific file I would get different results in both languages.

Andna
  • 6,539
  • 13
  • 71
  • 120
  • Well, you could convert the text in the file to ascii and fire it through `hashlib`. I only think it makes difference for encodings other than single-byte. – Morten Jensen Mar 10 '13 at 16:11
  • I done some more reading and after viewing this question:http://stackoverflow.com/questions/2672326/what-does-leading-x-mean-in-a-python-string-xaa i think I can use struct module to build this byte-string representation of anything and pass it to hashlib. Correct me if I am wrong – Andna Mar 10 '13 at 16:14
  • Unless you know for certain that you are using a multi-byte charset in the file you want to hash, you can just pass it to the hashing function like you'd do in C. – Morten Jensen Mar 10 '13 at 16:16

1 Answers1

5

In Python 2.x, str objects can be arbitrary byte streams. So yes, you can just pass the data into the hashlib functions as strs.

>>> import hashlib
>>> "this is binary \0\1\2"
'this is binary \x00\x01\x02'
>>> hashlib.sha1("this is binary \0\1\2").hexdigest()
'17c27af39d476f662be60be7f25c8d3873041bb3'
tom
  • 18,953
  • 4
  • 35
  • 35
  • Thanks, knowing this I can also use struct module to compute hash of for example for 4 byte integer. – Andna Mar 10 '13 at 16:21
  • @Andna you could do that with your old C hashing library too. I'm not sure why you'd involve the `struct` module for any of this. Can you elaborate? – Morten Jensen Mar 10 '13 at 21:10
  • Are you asking what I want to do this in Python? – Andna Mar 10 '13 at 23:59
  • No you cannot. Try `hashlib.sha1(u'\xcc\x88u')` and see. – Sassa NF Mar 13 '17 at 10:50
  • @SassaNF that's because the `u'...'` causes the string to be a `unicode` object instead of a `str` object. To turn a `unicode` object into a `str`, you'll need to encode it. e.g. `hashlib.sha1(u'\xcc\x88u'.encode("utf-8"))` – tom Mar 15 '17 at 22:11
  • Right. `'\xcc\x88u'` was already meant to be a UTF-8 encoding of a Unicode sequence U+0308 and `u`. So I started from the wrong place, and need to work out why it ends up `u'...'`, not `'...'`. – Sassa NF Mar 16 '17 at 13:13