There is an arbitrary amount of distinct unsigned integer values within a known range.
The number of integer values is << the number of integers within the range.
I want to build a data structure which allows the following runtime complexities:
- Insertion in O(1)
- After insertion is done:
- Deletion in O(1)
- Get all values within a query range in O(k) with k being the number of result values (returned values do not have to be sorted)
Memory complexity is not restricted. However, an astronomically large amount of memory is not available ;-)
Here is an example:
- range = [0, 1023]
- insert 42
- insert 350
- insert 729
- insert 64
- insert 1
- insert 680
- insert 258
- find values in [300;800] ; returns {350, 729, 680}
- delete 350
- delete 680
- find values in [35;1000] ; returns {42, 258, 64, 729, 258}
- delete 42
- delete 258
- find values in [0; 5] ; returns {1}
- delete 1
Is such a data structure even possible? (with the aid of look-up tables etc)?
An approximation I thought about would be:
Bin the inserted values into buckets. 0..31 => bucket 0, 32..63 => bucket 1, 64..95 => bucket 2, 96..127 => bucket 3, ...
Insertion: find bucket id using simple shifting arithmetic, then insert it into an array per bucket
Find: find bucket id of start and endpoint using shifting arithmetic. Look through all values in the first and last bucket and check if they are within the range or outside the range. Add all values in all intermediate buckets to the search result
Delete: find bucket id using shifting. Swap value to delete with last value in bucket, then decrement count for this bucket.
Downside: if there are many queries which query a range which has a span of less than 32 values, the whole bucket will be searched every time.
Downside 2: if there are empty buckets within the range, they will also be visited during the search phase.