Lookup table for sorted non-sequential elements

Question

I have an array of elements. The array is sorted by the ID of the elements, but the ID's are non-sequential e.g. there are gaps in the ID numbers.

I use binary search today to find a specific ID.

The ID is 3 bytes, which gives about 16 million possibilities. The number of ID's in a given array is much lower, maybe 10 000.

This is an embedded/plc platform, which means I can't have a 16MB lookup table, that takes too much memory. I've looked at bitsets an such, but I'm not sure if thats the right approach, or how to calculate an array offset from that.

I realize this may be a tough one given that I want to do the good old "speed for memory" trade-off, but I have very little memory, maybe 2MB or less to spare for this. But the hardware is fixed.

Edit: The elements of the array are fixed for a given application, no inserts or deletions of array elements.

How can I build/precompute a lookup table or similar to speed up locating an ID?

Thanks

can you change structure to array element? And I cannot get why you need search for ID, if you know ID you just access result with `myarray[ID_I_KNOW]` — Sergey Romanov, Dec 16 '18 at 05:09

score 2 · Accepted Answer · answered Dec 14 '18 at 15:17

2

I am assuming that the binary search is too slow. Since the table is fixed, there will be no additions or deletions during run-time, you can look at a "perfect hash" solution. Wiki has a really good artical explaiming this https://en.wikipedia.org/wiki/Perfect_hash_function

Basically, offline you need to run the table through a perfect hash generator, then during run time you run the ID through off-line generated formula to get the index of the item in the table.

answered Dec 14 '18 at 15:17

malaugh

167
6

I accept this answer, because even though I didnt use perfect hashing in my solution it gave me the right direction. I ended up using robin hood hashing. – krakers Dec 30 '18 at 11:08

Kuba hasn't forgotten Monica · Answer 2 · 2019-10-23T07:26:44.820

You only need the sorted table of entries that have IDs to begin with. The code can make an index of those for you, and use the index with binary search for lookup. The index will be 40kb. You can probably spare that much. It could be made 30kb if the IDs are truly 3-bytes only, but it'd be an unnecessary complication unless you really are 10kb short.

A hash could forgo the index, but are the space savings worth it? And if the entries are much larger than their IDs, then it won't take all that many vacant table slots to use up the savings.

VAR_GLOBAL
  entries : ARRAY[1..entryCount] OF ST_Entry := ...; // you need to preinitialize this array here
  index: ARRAY[1..entryCount] OF DINT;
  _dummy : BOOL := BuildIndex(ADR(index), ADR(entries), entryCount);
END_VAR
VAR_GLOBAL CONSTANT
  entryCount : DINT := 10000;
END_VAR

// Called once during PLC initialization only. Returns FALSE always.
FUNCTION BuildIndex : BOOL
VAR_INPUT
  index: POINTER TO DINT;
  entries : POINTER TO ST_ENTRY;
  count : DINT;
END_VAR
WHILE count > 0 DO
  index[count] := entries[count].Id;
  count := count - 1;
END_WHILE
END_FUNCTION

With this setup, an indexed lookup via binary search is easy:

FUNCTION LookupEntry : REFERENCE TO ST_Entry
VAR_INPUT
  id : DINT;
END_VAR
VAR
  begin : DINT := 1;
  mid : DINT;
  end : DINT := GVL.entryCount;
  midId : DINT;
END_VAR
WHILE TRUE DO
  mid := (begin + end) / 2;
  midId := index[mid];
  IF midId = id THEN
    LookupEntry REF= entries[mid];
    EXIT;
  END_IF
  IF mid=begin AND mid=end THEN
    EXIT;
  END_IF
  IF midId < id THEN
    begin := mid;
  ELSE
    end := mid;
  END_IF
END_WHILE;
// may return an invalid reference, use of reference will throw
END_FUNCTION

Lookup table for sorted non-sequential elements

2 Answers2