1

I'm looking to get a hash value for string and integer inputs. Using murmurhash3, I'm able to do it for strings but not integers:

pip install murmurhash3
import mmh3
mmh3.hash(34)

Returns the following error:

TypeError: a bytes-like object is required, not 'int'

I could convert it to bytes like this:

mmh3.hash(bytes(34))

But then I'll get an error message if the input is string

How do I overcome this without converting the integer to string?

Niv
  • 850
  • 1
  • 7
  • 22
  • Does this answer your question? [How to efficiently store an int into bytes?](https://stackoverflow.com/questions/56799021/how-to-efficiently-store-an-int-into-bytes) – Sneftel Nov 30 '20 at 15:25
  • Thanks @Sneftel but this only solves the issue of converting integers. As mentioned in my question I know how to do that. I need a way to convert both strings and integers. – Niv Nov 30 '20 at 15:42
  • I edited the title to ensure this point is clear – Niv Nov 30 '20 at 15:46

1 Answers1

2

How do I overcome this without converting the integer to string?

You can't. Or more precisely, you need to convert it to bytes or str in some way, but it needn't be a human-readable text form like b'34'/'34'. A common approach on Python 3 would be:

my_int = 34  # Or some other value
my_int_as_bytes = my_int.to_bytes((my_int.bit_length() + 7) // 8, 'little')

which makes a minimalist raw bytes representation of the original int (regardless of length); for 34, you'd get b'"' (because it only takes one byte to store it, so you're basically getting a bytes object with its ordinal value), but for larger ints it still works (unlike mucking about with chr), and it's always as small as possible (getting 8 bits of data per byte, rather than a titch over 3 bits per byte as you'd get converting to a text string).

If you're on Python 2 (WHY?!? It's been end-of-life for nearly a year), int.to_bytes doesn't exist, but you can fake it with moderate efficiency in various ways, e.g. (only handling non-negative values, unlike to_bytes which handles signed values with a simple flag):

 from binascii import unhexlify

 my_int_as_bytes = unhexlify('%x' % (my_int,))
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Thanks @ShadowRanger. As mentioned in my question I thought about converting the integer to bytes but then strings won't be accepted. Should I use an if statement that will convert bytes but not strings? I hope there's a better way. – Niv Nov 30 '20 at 15:45
  • 1
    @Niv: Yeah, `if isinstance(x, int):` test to control conversion seems the reasonable approach. You have two unrelated data types, you have to handle them differently. An unconditional `str(x)` might work on Python 2 (at the expense of longer strings from `int`), but on Python 3 it'll make `bytes` not work properly. Best to handle different types differently. – ShadowRanger Nov 30 '20 at 15:47