Efficient mapping of ranges to groups of values

Question

I'm attempting to determine a suitable way to accomplish the following.

I would like to have range -> set lookup within a particular range (say [0x0 - 0xffffffff]). Values are inserted into the range at ranges (so if we are working with T = unique strings), I might insert("beef3490", [0x1234, 0xFFFF]); and a single id may have multiple inserts, which may or may not overlap. Additionally, other values may be inserted which overlap, and where they overlap I should receive the set of unique strings as a result. Lastly, values can also be removed from ranges (not necessarily matching, but typically contained within their initial insert range).

Here is a simplified example usage:

insert("beef3490", [0x1234, 0xFFFF])
insert("beef3490", [0x11111, 0x22222])
insert("flam1456", [0x8000, 0xA0000])
remove("beef3490", [0x21000, 0x22000])
lookup(0x0) -> set<>
lookup(0x2000) -> set<beef3490>
lookup(0x9456) -> set<beef3490, flam1456>
lookup(0x21212) -> set<flam1456>
lookup(0x99789) -> set<flam1456>

This leads to a number of questions for me:

Is there a generalized name for this sort of problem, or a similar type of problem that I could find insight from?

What is an ideal implementation given the following constraints:

Fast/very-fast lookup time

Memory usage does not bloat (ie. the following operations have equivalent footprint)

Insert [10,20], Insert[20,30], Remove[14,24]

Insert [10,15], Insert[25,30]

Similar to last, the data structure should have stability on a long running system

Insert/delete times are not absolutely terrible (they don't need to be as fast as lookup though)

Given an ideal implementation, do you have advice for using it in C++

Thanks for all responses and help.

@andre: sadly, this does the look-up in the wrong direction! — Dietmar Kühl, Sep 02 '13 at 20:42
I'd think the data structure looks somewhat like a [_range tree_](http://en.wikipedia.org/wiki/Range_tree). — Dietmar Kühl, Sep 02 '13 at 20:44
An [interval tree](http://en.wikipedia.org/wiki/Interval_tree) mapping to sets of strings? — nickie, Sep 02 '13 at 20:47
@DietmarKühl: I see similarities, but a range tree seems to primarily be for doing efficient range lookup (ie. lookup(0x100, 0xFFF)) whereas this structure would only ever be looking up a single key, and expecting the matching values. — Kestred, Sep 02 '13 at 20:49
@KevinStenerson: It would still work but you are right, it seems [_segment trees_](http://en.wikipedia.org/wiki/Segment_tree) are a slightly less complex version used to locate all segments containing a point. The page for range trees points at segment trees... — Dietmar Kühl, Sep 02 '13 at 20:53
Duplicate? http://stackoverflow.com/questions/2147505/a-dictionary-object-that-uses-ranges-of-values-for-keys — Adam Burry, Sep 03 '13 at 02:14

score 1 · Accepted Answer · answered Sep 03 '13 at 02:42

Is there a generalized name for this sort of problem, or a similar type of problem that I could find insight from?

This problem is an interval-tree or segment-tree problem. Specifically, the tree/data-structure needs to perform aggregate on overlap operations. This means that when two intersecting ranges are inserted into the structure, they lookup value for a point within both ranges is equivalent to val(range A) + val(range B). The '+' operation could be addition if the values were integers, or in the case of sets it would be a union operation.

What is an ideal implementation given the constraints

A self-balancing Binary Search Tree (such as a Red-black tree, or Scapegoat tree) built to optimize searches based on ranges. Insert operations would additionally intersecting ranges as necessary to produce the correct return values. The ways in which the ranges are split varies by your requirements but is typically either through "joining" which discards the information about the original ranges but has a smaller footprint or "splitting" from which the original ranges can still be deduced.

Given an ideal implementation, do you have advice for using it in C++

See boost::icl. (Boost interval container library)

Efficient mapping of ranges to groups of values

1 Answers1