0

I'm trying to adapt MurmurHash into a program built for a class, but I can't seem to find explicit confirmation about what the variables represent.

I'm using the following as reference:

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.

    const unsigned int m = 0x5bd1e995;
    const int r = 24;

    // Initialize the hash to a 'random' value

    unsigned int h = seed ^ len;

    // Mix 4 bytes at a time into the hash

    const unsigned char * data = (const unsigned char *)key;

    while(len >= 4)
    {
        unsigned int k = *(unsigned int *)data;

        k *= m; 
        k ^= k >> r; 
        k *= m; 

        h *= m; 
        h ^= k;

        data += 4;
        len -= 4;
    }

    // Handle the last few bytes of the input array

    switch(len)
    {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0];
            h *= m;
    };

    // Do a few final mixes of the hash to ensure the last few
    // bytes are well-incorporated.

    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;

    return h;
} 

As I understand it, hash functions will take some value and puts it into a hash table. Is "len" the size of the hash table and "key" the value to be hashed?

  • 1
    "As I understand it, hash functions will take some value and puts it into a hash table" - you understand wrong. A hash function produces an integer value from some more complex value. –  Jul 08 '17 at 22:57

1 Answers1

0

Here's what they represent:

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )

key - Points to an array of bytes that you want to generate a hash value for

len - the number of bytes that key points to (or at least, the number of bytes you want included in the input from which the hash value is computed)

seed - pick whatever value you want for this; you'll get different hash codes for a given input if you use different seed values. If in doubt, just always pass in zero.

Returns a hash value computed from the passed-in bytes. You'll always get the same hash value back for the same byte-sequence (assuming you also passed in the same seed value), but the returned hash value will vary considerably for different byte-sequences (i.e. even a small difference in the input bytes will probably result in a very different returned hash-value)

As I understand it, hash functions will take some value and puts it into a hash table. Is "len" the size of the hash table and "key" the value to be hashed?

That's incorrect. MurmurHash2() merely computes a hash-code, and so MurmurHash2() could be useful as part of a hash table implementation, but it does not itself implement a hash table.

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • i still don't get what exactly the `len` is supposed to be? – TomSawyer May 05 '20 at 08:43
  • Key is a pointer to something; but it’s a void-pointer, so it conveys no type information. Therefore the MurmurHash2() function has no way to automatically determine how many bytes of memory it should read starting at that address; you have to explicitly tell it that by passing in the appropriate byte-count as an argument. For example, if key pointed to an int, you would pass in sizeof(int); or If key pointed to a C-string, you might pass in strlen(key). – Jeremy Friesner May 05 '20 at 13:33