A couple of thoughts:
First, overflow is expected in a hash function.
Second, since your function includes a 31*hashval
, and every element in string must have at least a value of 1, you would expect that the longest string you can have before you hit overflow is a string of all \x01, and it will overflow the hash when it gets to a length of 6 (since the *31
operation distributes the entire number over the 5 bits to the left, there is going to be carry, meaning you're likely to affect the sixth bit, and 6*6 = 36 > 32). The number will be less when the bytes are larger (the first byte pretty much defines the behavior - when it is large you may get overflow after five bytes). It is easier to show this with the real bits and bytes. I'm going to use a *32
rather than a *31
algorithm (not quite right, but less carry to worry about, al you'll get the idea):
byte hash is less than:
0000a000 00000000 00000000 00000000 0000a000
10000000 00000000 00000000 000000a0 10000000
b0000000 00000000 00000000 a0100000 b0000000
c0000000 00000000 00a01000 00b00000 c0000000
d0000000 0000a010 0000b000 00c00000 d0000000
anything OVERFLOW!
As was pointed out above, you can improve the predictable behavior of your (rather poor) hashing algorithm by declaring everything as unsigned integer; I would also recommend initializing the hash (and a value other than zero might be a good idea), rather than assuming that the compiler will set it to zero (I am not 100% sure that's defined behavior). Finally, if you are wondering about overflow, and want to get a warning, I would modify the code as follows:
for(p = s; *p; p++) {
if((hashval > 0xFFFFFFFF/31) || (*p>>1 + 31 * (hashval>>1)) > 0x7FFFFFFF) {
printf("hash is about to overflow at character %c\n", *p);
}
hashval = *p + 31 * hashval;
}