-3

I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.

I would guess this happen because the uint limitation is 2,147,483,647. BUT... when i try to go step by step the value is equal to 2146134658?. I´m not that good in math but it should be lower than the limitation.

#define FNV_PRIME_32 16777619
#define FNV_OFFSET_32 2166136261U

unsigned int hash_function(const char *string, unsigned int size)
{
    unsigned int str_len = strlen(string);

    if (str_len == 0) exit(0);

    unsigned int hash = FNV_OFFSET_32;

    for (unsigned int i = 0; i < str_len; i++)
    {
        hash = hash ^ string[i]; 
        // Multiply by prime number found to work well
        hash = hash * FNV_PRIME_32;
        if (hash > 765010506)
            printf("YO!\n");
        else
            printf("NOO!\n");
    }
    return hash % size;
}  

If you are wondering this if statement is only for me.

if (hash > 765010506)
    printf("YO!\n");
else
    printf("NOO!\n");

765010506 is the value for hash after the next run through the loop.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • 2
    Why are you calculating the length of `string` every time through the loop? Never do that. – Tom Karzes Dec 27 '18 at 04:23
  • ok i see could have done something like int x = size; – Tim Weissenfels Dec 27 '18 at 04:24
  • 1
    Right. Basically you took an O(n) loop and made it an O(n**2) loop. – Tom Karzes Dec 27 '18 at 04:25
  • Regarding your question, when the result of the multiplication of your `hash` variable is too large to fit in an `unsigned int`, the assignment to `hash` will only keep the low-order bits of the result. The high-order bits will be lost. – Tom Karzes Dec 27 '18 at 04:43
  • Ok thats what i thought. But as i mentioned why is this happening? I mean my value is closed to it but... – Tim Weissenfels Dec 27 '18 at 04:45
  • When you multiply two integers, the number of digits in the product will be either the sum of the number of digits of the two operands, or one less than the sum. For example, `1000 * 1000 = 1000000` - 4 digits times 4 digits producing 7 digits. You're multiplying by a 10-digit number, which will increase the digits by 9 or 10. So it will overflow very quickly. – Tom Karzes Dec 27 '18 at 04:51
  • Why shouldn't it get lover? You are using binary operation on it, then multiplication. What is the input? What is the output - the hash values in each loop iteration? What is the expected output? – KamilCuk Dec 27 '18 at 05:34
  • @KamilCuk **Why shouldn't it get lover?** Because it shouldn't overflow. **What is the input?** Something like hash_function("HELLO",40); 40 is for example the size of the array or hash table. **What is the expected output?** Just numbers below the given size. – Tim Weissenfels Dec 27 '18 at 05:41
  • I don't understand why any of this is confusing. Of *course* it overflows. Each time through the loop you're basically adding 9-10 digits (unless the value is 0 right before you multiply). Under normal circumstances, it should overflow within two iterations. – Tom Karzes Dec 27 '18 at 06:31
  • Yes you are Right! – Tim Weissenfels Dec 27 '18 at 06:36
  • Have you considered using existing well-defined string hash functions like `murmur3` or openSSH `lh_strhash`? – David C. Rankin Dec 27 '18 at 07:49

2 Answers2

2

I dont understand why the interger Value "hash" is getting lower in/after the 3 loop.

All unsigned integer arithmetic in C is modular arithmetic. For unsigned int, it is modulo UINT_MAX + 1; for unsigned long, modulo ULONG_MAX + 1, and so on.

(a modulo m means the remainder of a divided by m; in C, a % m if both a and m are unsigned integer types.)

On many current architectures, unsigned int is a 32-bit unsigned integer type, with UINT_MAX == 4294967295.

Let's look at what this means in practice, for multiplication (by 65520, which happens to be an interesting value; 216 - 16):

unsigned int  x = 1;
int           i;
for (i = 0; i < 10; i++) {
    printf("%u\n", x);
    x = x * 65520;
}

The output is

1
65520
4292870400
50327552
3221291008
4293918720
16777216
4026531840
0
0

What? How? How come the result ends up zero? That cannot happen!

Sure it can. In fact, you can show mathematically that it happens eventually whenever the multiplier is even, and the modulo is with respect to a power of two (232, here).

Your particular multiplier is odd, however; so, it does not suffer from the above. However, it still wraps around due to the modulo operation. If we retry the same with your multiplier, 16777619, and a bit longer sequence,

unsigned int  x = 1;
int           i;
for (i = 0; i < 20; i++) {
    printf("%u\n", x);
    x = x * 16777619;
}

we get

1
16777619
637696617
1055306571
1345077009
1185368003
4233492473
878009595
1566662433
558416115
1485291145
3870355883
3549196337
924097827
3631439385
3600621915
878412353
2903379027
3223152297
390634507

In fact, it turns out that this sequence is 1,073,741,824 iterations long (before it repeats itself), and will never yield 0, 2, 4, 5, 6, 7, 8, 10, 12, 13, 14, or 15, for example -- that is, if it starts from 1. It even takes 380 iterations to get a result smaller than 16,777,619 (16,689,137).

For a hash function, that is okay. Each new nonzero input changes the state, so the sequence is not "locked". But, there is no reason to expect the hash value increases monotonically as the length of the hashed data increases; it is much better to assume it is "roughly random" instead: not really random, as it depends on the input only, but also not obviously regular-looking.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
0

I would guess this happen because the uint limitation is 2,147,483,647.

The maximum value of a 32-bit unsigned integer is roughly 4 billion (232 - 1 = 4,294,967,295). The number you're thinking of is the maximum value of a signed integer (231 - 1).

2,146,134,658 is slightly less than 231 (so it could fit in even an unsigned 32-bit integer), but it's still very close to the limit. Multiplying it by FNV_PRIME_32 -- which is roughly 224 -- will give a result of roughly 255, which will cause overflow.