Search with a mask

Question

There is big array of entries having the following type:

typedef struct {
    int value;
    int mask;
    int otherData;
} Entry;

I'd like to find an entry in this array according to provided int key; as fast as posible. The entry is required to ensure that (key & mask) == value.

What will be the best way for such array organization and what is the corresponding algorithm processing it?

Edit: There is no restrictions on the array organization; it is static and can be prepared before lookup. The value and mask may have arbitrary values.

Edit2: value and mask may have arbitrary values, but number of entries in the array is about 10000. So, certain "paterns" can be calculated in advance.

The number of lookups is big.

Is there a guarantee that there's only one entry in the array for a given key ? — SirDarius, Jul 05 '11 at 14:01
are there any restrictions on key and mask? Without some kind of restriction only linear search makes sense — Karoly Horvath, Jul 05 '11 at 14:03
Addition: There is no limitation on the array organization. So, it can be sorted, for example.
It will be enough to find the first entry that applies — Serge C, Jul 05 '11 at 14:07
Please add some additional constraints or tell us if there aren't any.. Do you use a limited set of masks? Are you often search for the same mask? Is there a high probability for finding a matching element? ... — Karoly Horvath, Jul 05 '11 at 14:12
Start off by discarding any entries from the array that don't conform to `(value & mask) == value` as they will never match. — SF., Jul 05 '11 at 14:30

Steve Jessop · Accepted Answer · 2011-07-05T14:48:07.600

Each bit is independent, so in a preprocessing phase[*] you could classify each entry 32 (or however big your int is) times. Each classification stores 2 sets: those which match at that bit when key is 0 and those which match when key is 1.

That is, if value == 1 and mask == 0 at that bit, then that classification doesn't store that entry at all, since it doesn't match any value of key (in fact, no matter what scheme you use, such entries should be removed during any preprocessing stage, so no classification should store an entry if even one bit is like this). If both 0, store into both sets. Otherwise store into one of the two sets.

Then, given your key, you want to find a fast intersection of 32 sets.

Depending on the size of the original array, it may be that the best way to store each set is a giant bit array indicating whether each entry in the array is in the set or not. Then finding the intersection can be done a word at a time - & together 32 words, one from each bit array. If the result is 0, keep going. If the result is non-0, you have a match, and the bit that's set in the result tells you which entry is the match. This is still linear in the size of the array, of course, and in fact you're doing 31 & operations to check 32 entries for a match, which is about the same as the simple linear search through the original array. But there's less comparison and branching, and the data you're looking at is more compressed, so you might get better performance.

Or there might be a better way to do the intersection.

If keys tend to be re-used then you should cache the results of the lookup in a map from keys to entries. If the number of possible keys is reasonably small (that is, if significantly less than 2^32 keys are possible inputs, and/or you have a lot of memory available), then your preprocessing phase could just be:

take each entry in turn
work out which possible keys it matches
add it to the map for those keys

[*] Without any preprocessing, obviously all you can do is check every array member until either you find a match or else you've checked everything.

You don't need to have 32 sets divided up one per key bit, either - you can group your key bits. For example, if you group key bits into 2s, then each group of 2 key bits corresponds to 4 sets, so you only need to calculate the intersection of 16 sets. Grouping key bits into 8s might be the best tradeoff - you have 1024 sets in total, and each byte of the key corresponds selects one of 256 sets, and you then calculate the intersection of 4 sets. — caf, Jul 06 '11 at 04:55

score 2 · Answer 2 · answered Jul 05 '11 at 13:58

Since you don't have extra information (for example, that the array is sorted) you need a linear search - traverse the array and check the condition - pseudocode:

for( size_t index = 0; index < arraySize; index++ ) {
   if( ( array[index].mask & key ) == array[index].value ) ) {
      return index;
   }
}
return -1;

Lightness Races in Orbit · Answer 3 · 2011-07-05T14:10:49.923

1

If you instead had a map of keys to Entries, then this would be really easy.
~~If your array were sorted by key, then you could do a lexicographic binary search with some small effort.~~ [actually, maybe not!]
As it is, you're just going to have to traverse the array until you find what you're looking for. That is, iterate from start to end and stop when you find it.

_{As an aside, this is a great example of how a choice of data structure affects the availability of algorithms down the line. You can't just throw algorithms at a problem if you picked the wrong data structures in the first place!}

edited Jul 05 '11 at 14:10

answered Jul 05 '11 at 14:00

Lightness Races in Orbit

378,754
76
643
1,055

I can't follow you.. how is this supposed to work with a bit mask like `b101`? – Karoly Horvath Jul 05 '11 at 14:02
1

@yi_H: You know how to do a comparison, because you showed us in your question! Apply the comparison to each array element in turn until it reveals **great success**. – Lightness Races in Orbit Jul 05 '11 at 14:04
that was Serge.. I asked you how could lexicographic binary search work with a bit mask like that – Karoly Horvath Jul 05 '11 at 14:08
@yi_H: Oh, I'm sorry; mistook you for the OP there. In hindsight, perhaps a binary search wouldn't be so easy. – Lightness Races in Orbit Jul 05 '11 at 14:10

score 0 · Answer 4 · answered Jul 05 '11 at 14:02

A linear search would of course work, but if you need many lookups with the same key, you could try sorting the range first according to (key & mask). If you only have a few, fixed keys, you could try using a boost.multi_index, with one index for each key value.

score 0 · Answer 5 · answered Jul 05 '11 at 14:14

If the mask varies arbitrarily for each entry, I don't see much alternative to a linear search. If there are significant constraints on mask, such that only a few values are possible, it might be better to use some sort of map for each value of mask, doing a linear search to find the first map which contained the value you are looking for. Alternatively, if the masks only concern a few bits, it may be worth using a multimap, ordered by value masked with an and of all of the masks, and indexed with key handled the same, then a linear search using the full key to find the exact match.

score 0 · Answer 6 · answered Jul 05 '11 at 21:30

If the number of zero bits in your mask is small, you could duplicate the entry for each "don't-care" bit in the mask. For example if value=0 and mask=0xfffe then you'd put an entry in the table for key=0 and key=1. For value=0 and mask=0xfeef, put 4 entries in the table: key=0x0000, key=0x0010, key=0x0100, and key=0x0110. Now you can sort the entries and use a binary search, or use a binary search structure such as std::map.

Search with a mask

6 Answers6