0

I want to hash 64 bit integers only. I am using the implementation of murmurhash3 given here. Can there be some improvements in the code given this constraint. I am not able to figure it out completely but I think that the the for loop at line 171 might be the target. Please suggest something on this.

Aman Deep Gautam
  • 8,091
  • 21
  • 74
  • 130

1 Answers1

2

If you need only hash 64bit numbers, then use the numbers value, cause all murmur3 will do is waste CPU cycles mixing the same input number to the same output number, the only exception is if you are changing the seed.

if you really want to optimize for a fixed size, you can copy the function, and just slightly alter it (allowing the compilers constant propagation and constant folding to do the heavy lifting):

void MurmurHash3_x86_128_uint64 ( const void * key, uint32_t seed, void * out)
{
  const int len = sizeof(uint64_t); 
  //now len is a compile time constant, and can be folded when this
  //function is not inlined (else it would just be a constant input,
  //which could only be folded when the function is inlined)
  const uint8_t * data = (const uint8_t*)key;
  const int nblocks = len / 16;

if you are using C++ at any later stage, it would make sense to turn this into a template along the lines of:

template<const size_t len>
void MurmurHash3_x86_128_uint64 ( const void * key, uint32_t seed, void * out)
{
  const uint8_t * data = (const uint8_t*)key;
  const int nblocks = len / 16;

Also note, that some smarter compilers (ICC, MSVC, GCC) will detect if the function is only ever called with the same constant arguments (including partly constant argument lists) and fold those constants into the function (this requires the "whole program optimization" option to be enabled)

Necrolis
  • 25,836
  • 3
  • 63
  • 101
  • Thanks for the answer. It was really a great help. Also I want to ask you about where to read something on murmur hash. I am just beating around the bush with no insight on the algorithm, rather as of now I am decoding from the code itself. I would have missed the optimization you suggested for `mix` function. – Aman Deep Gautam Jun 15 '12 at 07:25
  • @AmanDeepGautam: the only real place is the link you already have (http://code.google.com/p/smhasher/) and the source code, other than that you can try http://burtleburtle.net/bob/hash/doobs.html which lists many hash functions, my be helpful (and http://www.team5150.com/~andrew/noncryptohashzoo/ for a breakdown of characteristics). – Necrolis Jun 15 '12 at 08:31
  • I implemented an altered version of `MurmurHash3_x86_128_uint64` but I didn't remove the `fmix` part from the implementation as you suggested "as they were waste of cpu cycles." Since there was not much proof on how this hash function works, so I decided it's better not to mess around with the structure of this hash function. What I did was took the data form 64 bit int two time and cast it to 64 bit `uint62_t` both the time. This way the structure of the method to calculate hash does not change and I get 128 bit hash value. I would like to get your comment on this apporach. – Aman Deep Gautam Jun 15 '12 at 11:34