4

I'm trying to make a python function and a nodejs function compute the same hash. However, it seems like the binary that is outputted is different between nodejs crypto and python hashlib.

The python I'm using is:

hash = hashlib.sha512()
hash.update(salt)
hash.update(password.encode('utf8'))
hash.digest()

The node/coffeescript is:

crypto.createHash('sha512').update(salt, 'binary').update(password, 'utf8').digest()

These lines should produce the same result, but for some reason they don't. Help?

Scott Arciszewski
  • 33,610
  • 16
  • 89
  • 206
Jon Chu
  • 1,877
  • 2
  • 20
  • 19
  • Could you include the results you get from running those two examples with example inputs? – Dan D. Nov 20 '12 at 01:15
  • When I run the exact same things, I get the same results—except that coffeescript returns a Unicode string (apparently Latin-1 decoded?), while Python (at least 3.x) returns a `bytes` value. So, I get things like `'\u0006'` vs. `'\x06'` for the third character. – abarnert Nov 20 '12 at 01:16
  • After testing, it's true for Python 2.x as well. – abarnert Nov 20 '12 at 01:30

1 Answers1

4

They do seem to produce the same result, but because node's digest() returns a Unicode string, while Python's returns a bytes object, this may not be immediately obvious:

CoffeeScript 1.4.0 on Node 0.8.11:

coffee> salt='abc'
'abc'
coffee> password='def'
'def'
coffee> d = crypto.createHash('sha512').update(salt, 'binary').update(password, 'utf8').digest()
'ã.ñ#èí&ezK=\u0007­»v\u0018\u0006CWEVNAP §\u0003¾*}¶\u001e=9\f+¹~-L1\u001fÜiÖ±&\u0005õ© ç'

Python 3.3.0:

>>> salt, password=b'abc', 'def'
>>> hash = hashlib.sha512()
>>> hash.update(salt)
>>> hash.update(password.encode('utf8'))
>>> d = hash.digest()
>>> print(d)
b'\xe3.\xf1\x96#\xe8\xed\x9d&\x7fez\x81\x94K=\x07\xad\xbbv\x85\x18\x06\x8e\x88CWEVN\x8dAP\xa0\xa7\x03\xbe*}\x88\xb6\x1e=9\x0c+\xb9~-L1\x1f\xdci\xd6\xb1&\x7f\x05\xf5\x9a\xa9 \xe7'

Looks pretty different, right? But if you look closely, the printable characters are the same—that CWEVN run is pretty obvious. And you can see even more similarities if you decode it as Latin-1…

>>> print(d.decode('latin1'))
ã.ñ#èí&ezK=­»vCWEVNAP §¾*}¶=9
                                   +¹~-L1ÜiÖ±&õ© ç

It's pretty obvious this is the exact same string, it's just that Node is escaping the non-printable characters.

And Python 2.7.2:

>>> salt, password='abc', u'def'
>>> hash = hashlib.sha512()
>>> hash.update(salt)
>>> hash.update(password.encode('utf8'))
>>> d = hash.digest()
>>> print(d)
?.?#??&ez??K=??v???CWEVN?AP???*}??=9
                                 +?~-L1?iֱ&? ?
>>> print(d.decode('latin1'))
ã.ñ#èí&ezK=­»vCWEVNAP §¾*}¶=9
                                   +¹~-L1ÜiÖ±&õ© ç

Again, same string.

Given that my terminal, C locale, etc. are all UTF-8 (this is OS X), I have no idea why CoffeeScript is decoding as Latin-1.

abarnert
  • 354,177
  • 51
  • 601
  • 671