11

I want to store hashes as binary (64 bytes). But for any type of API (web service) I would want to pass them around as strings. hashlib.hexdigest() will give me a string, and hashlib.digest() will give me the binary. But if, for example, I read in the binary version from disk, how would I convert it to a string? And if I read in the string from a web service, how would I convert it to binary?

esac
  • 24,099
  • 38
  • 122
  • 179

4 Answers4

10

You might want to look into binascii module, specifically hexlify and unhexlify functions.

lormus
  • 509
  • 2
  • 7
7

In 2.x you can use str.decode('hex') and str.encode('hex') to convert between raw bytes and a hex string. In 3.x you need to use the binascii module.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
4

You could start with the string version to pass around and display:

>>> import hashlib
>>> string_version = hashlib.md5(b'hello world').hexdigest()

Convert it to binary to write it to disk:

>>> save_as_binary = string_version.encode('utf-8')
>>> print(save_as_binary)
b'5eb63bbbe01eeed093cb22bb8f5acdc3'

When reading it back from disk, convert it back to a string:

>>> back_to_string = save_as_binary.decode('utf-8')
>>> print(back_to_string)
5eb63bbbe01eeed093cb22bb8f5acdc3
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 6
    To clarify: `hashlib.md5(b'hello world').hexdigest().decode('hex') == hashlib.md5(b'hello world').digest()` – Ben Aug 15 '16 at 16:58
  • @Ben Thanks alot . This has saved lot of my time. I am working on aws s3 and was trying to figure out how does ETag conversion happens from string->binary->string .. There are lot of answers online but nothing was working for me. But when I tried your answer, it worked just fine. So thank you alotttttt. – anirudha sonar Mar 09 '18 at 09:06
  • 2
    The statement `string_version.encode('utf-8')` does _not_ deliver the binary interpretation of the hexdigest. It just delivers a binary string of the hex string. save_as_binary is not the same as digest() which was asked for. – Johannes Overmann Jun 20 '19 at 21:48
0

Some existing answers here are missing the point. The digest is bytes and the hexdigest is a str:

>>> from hashlib import md5
>>> h = md5(b"hello world")
>>> h.digest()
b'^\xb6;\xbb\xe0\x1e\xee\xd0\x93\xcb"\xbb\x8fZ\xcd\xc3'
>>> h.hexdigest()
'5eb63bbbe01eeed093cb22bb8f5acdc3'

Going from digest (bytes) to hexdigest (str), use bytes.hex:

>>> h.digest().hex()
'5eb63bbbe01eeed093cb22bb8f5acdc3'

Going from hexdigest (str) to digest (bytes), use bytes.fromhex:

>>> bytes.fromhex(h.hexdigest())
b'^\xb6;\xbb\xe0\x1e\xee\xd0\x93\xcb"\xbb\x8fZ\xcd\xc3'
wim
  • 338,267
  • 99
  • 616
  • 750