1

I'm currently trying to implement a hashtable/trie, but when I pass in parameters to murmurhash2, it gives back a number but I get run time errors of unsigned int overflow:

test.c:53:12: runtime error: unsigned integer overflow: 24930 * 1540483477 cannot be represented in type 'unsigned int'

test.c:60:4: runtime error: unsigned integer overflow: 2950274797 * 1540483477 cannot be represented in type 'unsigned int' 6265

I've put a bunch of stars(*) on the lines 53 and 60

I'm not sure if I'm passing some parameters wrong. Any help would be greatly appreciated!

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed );

int main(void)
{
   const char* s= "aa";
   unsigned int number= MurmurHash2 (s, (int)strlen(s), 1) % 10000;
   printf("%u\n", number);
}

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.

const unsigned int m = 0x5bd1e995;
const int r = 24;

// Initialize the hash to a 'random' value

unsigned int h = seed ^ len;

// Mix 4 bytes at a time into the hash

const unsigned char * data = (const unsigned char *)key;

while(len >= 4)
{
    unsigned int k = *(unsigned int *)data;

    k *= m;
    k ^= k >> r;
    k *= m;

    h *= m;
    h ^= k;

    data += 4;
    len -= 4;
}

// Handle the last few bytes of the input array

switch(len)
{
case 3: h ^= data[2] << 16;
case 2: h ^= data[1] << 8;
case 1: h ^= data[0];
        h *= m; ************************************************
};

// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.

h ^= h >> 13;
h *= m;   **************************************
h ^= h >> 15;

return h;
}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Jeff Guan
  • 13
  • 4
  • 2
    Are you building your program with a sanitizer like `-fsanitize=integer`? – nwellnhof Aug 30 '17 at 14:52
  • the stars (`*`) are a great idea for marking the lines... but maybe use `// **** line 53` (for example) so they are a valid C comment...? – Myst Aug 30 '17 at 15:06

2 Answers2

4

It seems that you're building with the UBSan option -fsanitize=unsigned-integer-overflow or some other option like -fsanitize=integer that enables this check. The documentation says:

Note that unlike signed integer overflow, unsigned integer is not undefined behavior. However, while it has well-defined semantics, it is often unintentional, so UBSan offers to catch it.

In the case of MurmurHash, unsigned integer overflow in multiplications is fully intentional, so you should disable the option.

  • If you use -fsanitize=unsigned-integer-overflow explicitly, remove it.
  • If it is enabled by another option, pass -fno-sanitize=unsigned-integer-overflow.
  • Alternatively, annotate the function MurmurHash2 with __attribute__((no_sanitize("unsigned-integer-overflow"))).

Another note: Your code seems to be copied from the 32-bit reference implementation of MurmurHash2 which assumes 32-bit ints. You should consider using uint32_t instead.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
0

unsigned int has a system dependent number of bits.

On most systems, this number is 32 bits (4 bytes) but some systems might use different sizes (i.e. 64bits (8 bytes) on some machines).

However, murmur hash "words" are a specific size. The 64 bit variant requires a 64bit unsigned type and the 32 bit variant requires a 32bit unsigned type.

This inconsistency can be resolved by using the uint64_t or uint32_t types defined in <stdint.h>.

I would add that the suffix UL (unsigned long) should probably be added to any numerical constants you use. i.e. 2950274797UL * 1540483477UL.

As indicated by @nwellnhof, your code seems to use the 32 bit variant of the algorithm.

The overflow in the multiply instruction is normal in these cases (where the result larger than the number of available bits and is truncated). This loss of data is acceptable as part of the hashing process.

Consider informing the compiler about the expected result using:

 h = (uint32_t)(((uint64_t)h * m) & 0xFFFFFFFF)

Good Luck!

Myst
  • 18,516
  • 2
  • 45
  • 67
  • 1
    Given the 32-bit constant `0x5bd1e995`, the OP wants to implement the 32-bit variant. The code seems to be copied from the [reference implementation](https://sites.google.com/site/murmurhash/) which uses `unsigned int`s. – nwellnhof Aug 30 '17 at 15:18
  • @nwellnhof - I hadn't noticed, thanks. I updated the answer to reflect this observation. – Myst Aug 30 '17 at 15:28
  • An `unsigned int` type with 8 bits would not conform to the C specification. It needs at least 16 bits. – Ian Abbott Aug 30 '17 at 15:41
  • @IanAbbott - oops... I meant 8 bytes (64 bits). Some machines interfaces use the larger `int` in 128 bit instruction chips. Thanks for the catch! – Myst Aug 30 '17 at 15:43
  • Thanks! Line 60 still erred so I used h*=m%2^32 to get it back in range. – Jeff Guan Aug 30 '17 at 15:58
  • @JeffGuan - I'm happy to help. as to the "h*=m%2^32", I might point out @nwellnhof 's answer regarding compiler flags. The unsigned overflow in the multiplication is expected and should be ignored rather than fixed. As to a fix, I would recommend `h = (uint32_t)( (h*m) & 0xFFFFFFFF)` instead of a modulus operation. (I updated the answer with this tip). – Myst Aug 30 '17 at 16:43