12

I'm trying to convert an MD5 hashed value into a a bit integer in python. Does anyone have any idea how I would go about doing this?

I currently go through several ngrams applying a hash to each ngram:

for sentence in range(0,len(doc)):
        for i in range(len(doc[sentence]) - 4 + 1):
            ngram = doc[sentence][i:i + 4]
            hashWord = hashlib.md5()
            hashWord.update(ngram)

Thanks for any help.

djcmm476
  • 1,723
  • 6
  • 24
  • 46

1 Answers1

37

If by "into bits", you mean a bit string for instance, then something like:

import hashlib

a = hashlib.md5('alsdkfjasldfjkasdlf')
b = a.hexdigest()
as_int = int(b, 16)
print bin(as_int)[2:]
# 11110000110010001100111010111001011010101011110001010000011010010010100111100
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • 2
    I think he maybe just wants a big int ... but not sure ...(+1 all the same) either way he should be able to get his answer here – Joran Beasley Nov 03 '12 at 21:28
  • @JoranBeasley Yup - I was thinking that, and if that's the case, the OP can work with `as_int`... The bitstring was just an example – Jon Clements Nov 03 '12 at 21:28
  • Wow, this takes me back. Getting an integer from a hash was the [very first question](http://stackoverflow.com/questions/4612150/python-256bit-hash-function-with-number-output/4612189#4612189) I answered on SO! – DSM Nov 03 '12 at 21:29
  • @DSM wow! Well, with 20k on the clock you've come a long way since then! – Jon Clements Nov 03 '12 at 21:30
  • Clements, you hero, you. That's exactly what I was looking for. A correct answer tick to you, sir! – djcmm476 Nov 03 '12 at 21:35
  • You need to make it encode it as `hashlib.md5(b'alsdkfjasldfjkasdlf')` or `hashlib.md5('alsdkfjasldfjkasdlf'.encode())` – George Pipis Feb 18 '21 at 16:20