2

I have a set of structs that are uniquely identified by a uint16_t. There will never be more than 256 of these structs (due to reasons I won't get into a uint16_t must be used to identify the structs).

I would like to store these structs via an array of pointers. Since there will never be more than 256 structs, I would like to statically allocate an array of struct pointers with size 256. To do this though, I need a function to uniquely map uint16_t onto uint8_t.

Given that I will know all of the keys at runtime (though I won't know before runtime) is there an algorithm that exists that will give me a unique mapping (i.e. perfect hash) in general?

One caveat is that the system I am using has 16bit addresses. So for efficiency reasons I don't want to use any types larger than uint16_t.

HXSP1947
  • 1,311
  • 1
  • 16
  • 39

2 Answers2

2

Given that I will know all of the keys at runtime (though I won't know before runtime) is there an algorithm that exists that will give me a unique mapping (i.e. perfect hash) in general?

Given that you have up to 256 (16-bit) values to map, there are in principle many mappings you could use. If the keys to be supported are uncorrelated, however, then any algorithm to compute the mappings requires all 256 keys or functions of them as parameters. In comments, for example, the idea of a 256th-degree polynomial is discussed, and the parameters would there are the coefficients of the polynomial.

Now consider that since the mapping needs 256 parameters, it also will use all those parameters in some way. How, then, can something with these general characteristics be efficiently computable?

The best solution I see for uncorrelated keys is to put all the keys in an array, sort them, and use each key's index in the sorted array as the wanted hash value. (Thus the parameters are the keys themselves in this case.) You can compute those indices fairly efficiently by binary search. Supposing that you store the sorted array for the duration of the program, I don't think you can do any such computation much more efficiently than this, and it's simple enough that you can be confident of its correctness.

That assumes you know all the keys before you need to hash any of them. If that is not the case then at minimum you can use an unsorted array and a linear search (though there may be in-between approaches, too). A linear search may not seem particularly efficient, but it's unlikely to be worse on average than a purely arithmetic computation involving 256 parameters.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • How would using the indices as hash values solve my problem? While the hash it would produce would be unique, how would I later recalculate the hash? So suppose the keys are [1000,989, 15, 17] which would gives hashes [3, 2, 0, 1]. If I wanted the hash later for 1000, wouldn't I have to linearly search for it? That somewhat defeats the purpose of using a hash. If I wanted to take that route a better solution would be to put all of my elements into a linked list and just search for the struct with the appropriate key. – HXSP1947 Jun 20 '18 at 21:55
  • Yes, @HXSP1947, the hash computation would be a search. If you keep a sorted array of the keys then it would be a binary search; if an unsorted array then a linear search. – John Bollinger Jun 21 '18 at 02:31
-1

I ended up using the first fit algorithm to uniquely map 16 bit values onto 8 bit values (works under the assumption that there are no more than 256 16 bit values). Below is a very short example that I coded up to test it. While the mapping function is fairly expensive (called create mapping below), the get_value function is constant. Therefore, once the mapping is established it should be fairly quick to calculate the hash (give by remainder + offset[divisor] in my example) and get the associated value.

uint16_t keys[256];
uint16_t actual_mapping[256];
uint8_t offset[256];
uint8_t num_keys = 0;

void 
create_mapping()
{
    uint8_t mapping_matrix[num_keys][2];

    uint8_t index;
    uint8_t test_index;
    for(index = 0; index < num_keys; index++)
    {
        mapping_matrix[index][0] = (uint8_t) (keys[index] / 256);
        mapping_matrix[index][1] = keys[index] % 256;
    }

    for(index = 0; index < num_keys - 1; index++)
    {
        uint8_t hash_not_found = 1;
        while(hash_not_found)
        {
            hash_not_found = 0;
            for(test_index = index + 1; test_index < num_keys; test_index++)
            {
                if(mapping_matrix[index][0] != mapping_matrix[test_index][0])
                {
                    if((uint8_t) (mapping_matrix[index][1] + offset[mapping_matrix[index][0]]) == (uint8_t) (mapping_matrix[test_index][1] + offset[mapping_matrix[test_index][0]]))
                    {
                        hash_not_found = 1;
                        offset[mapping_matrix[index][0]]++;
                        break;
                    }
                }
            }
        }

        actual_mapping[(uint8_t) (mapping_matrix[index][1] + offset[mapping_matrix[index][0]])] = keys[index];
    }
}

uint16_t
get_value(uint16_t value)
{
    uint8_t divisor = (uint8_t) (value / 256);
    uint8_t remainder = value % 256;
    return actual_mapping[(uint8_t) (remainder + offset[divisor])];
}

int main(int argc, char** argv) {

    keys[0] = 0;
    keys[1] = 256;
    keys[2] = 4;
    keys[3] = 13000;
    keys[4] = 14000;
    keys[5] = 15000;
    keys[6] = 16000;
    keys[7] = 3500;
    keys[8] = 69;
    keys[9] = 15;
    keys[10] = 16;
    keys[11] = 789;
    keys[12] = 12001;

    num_keys = 13;

    create_mapping();

    uint8_t index;
    for(index = 0; index < num_keys; index++)
    {
        printf("%hu\n", get_value(keys[index]));
    }  


}
HXSP1947
  • 1,311
  • 1
  • 16
  • 39
  • Whereas I'm prepared to believe that this general approach can work, it looks like there are several problems with the actual code you present. To begin, be aware that your arithmetic on `uint8_t` operands will be done with operands promoted to `int` and evaluating to an `int`, which can produce results larger than 255. – John Bollinger Jun 21 '18 at 04:12
  • I'm also very suspicious of your computation of elements of `offset`. It looks like after you choose an offset that works for one key with respect to those that follow it, you may later change that offset to accommodate a different key with the same most-significant byte, without checking whether the new offset also works for the first key. – John Bollinger Jun 21 '18 at 04:16
  • Moreover, I find it odd that in the innermost block you are updating the `offset` for `index` rather than that for `test_index`. I'm not *confident* that that's wrong, but that's a second-order issue of its own: if this code is in fact correct, it is difficult to reason out why. – John Bollinger Jun 21 '18 at 04:22
  • @JohnBollinger "To begin, be aware that your arithmetic on uint8_t operands will be done with operands promoted to int and evaluating to an int, which can produce results larger than 255", I'm not sure where you're looking. Could you be a bit more specific? – HXSP1947 Jun 21 '18 at 04:23
  • I'm looking all over, but most importantly at your computations of the form `remainder + offset[divisor]`. You rely on these to produce results in the range 0-255, but it is not certain that they will do. You need to consistently cast the results of those particular computations to `uint8_t`. – John Bollinger Jun 21 '18 at 04:29
  • @JohnBollinger, good catch. Should be fixed. "It looks like after you choose an offset that works for one key with respect to those that follow it, you may later change that offset to accommodate a different key with the same most-significant byte, without checking whether the new offset also works for the first key." If I'm understanding you correctly here you are saying things can and most likely will go south if the key is later changed? I agree, after the keys are selected they will not change. – HXSP1947 Jun 21 '18 at 04:39
  • No, I'm saying that it looks like the elements of `offset` computed by `create_mapping()` are not certain to yield distinct mappings for all keys known at the time it runs. One way to help with that would be to sort the keys before computing (any of) the offsets, so that all keys with the same most-significant byte (which determines the relevant `offset`) are handled together. Sorting does not by itself provide a complete solution to the problem, however. – John Bollinger Jun 21 '18 at 04:46
  • @JohnBollinger, If you're saying that create_mapping() is not guaranteed to find a mapping I partially agree. A mapping may not exist. if there are more than 256 keys. However, if there is less than 256 keys a mapping will be found since the algorithm is essentially brute force. The idea is the following (1) pick a key and increment its hash by 1 until it has a unique hash. (2) go to the next key and repeat (1). Since the hash is represented by 0-255 and there are less than 256 elements a valid hash must exist. The key idea is the integer overflow which I am abusing to get what I want. – HXSP1947 Jun 21 '18 at 04:49
  • I had made a mistake in how I implemented the overflow though (and you caught) that I believe I have since corrected. – HXSP1947 Jun 21 '18 at 04:52
  • But in fact that's *not* what you do, nor does your approach afford that. Whenever you find a collision and therefore increment an element of `offset`, you are changing the hash values of *every* key with a certain most-significant byte. That may be many keys, including some that you've already checked for collisions, and the change invalidates those prior collision tests. You need to collision-check all the keys relying on the same offset together. Moreover, it's only meaningful to test them against keys whose offsets are already decided. – John Bollinger Jun 21 '18 at 05:11
  • To be clear: nothing I am saying relies on there being more than 256 keys, nor on keys changing or being added after the initial `create_mapping()` computation. – John Bollinger Jun 21 '18 at 05:19