3

I'm wondering if there is something like HashSet, but keyed by a range of values.

For example, we could add an item which is keyed to all integers between 100 and 4000. This item would be returned if we used any key between 100 and 4000, e.g. 287.

I would like the lookup speed to be quite close to HashSet, i.e. O(1). It would be possible to implement this using a binary search, but this would be too slow for the requirements. I would like to use standard .NET API calls as much as possible.

Update

This is interesting: https://github.com/mbuchetics/RangeTree

It has a time complexity of O(log(N)) where N is number of intervals, so it's not exactly O(1), but it could be used to build a working implementation.

Contango
  • 76,540
  • 58
  • 260
  • 305
  • 2
    Why not just use HashSet with a comparer that matched the logic needed? – Jon Hanna Aug 22 '16 at 10:41
  • But a HashSet does not have a key. – paparazzo Aug 22 '16 at 10:54
  • 1
    Is anything known about the ranges or can they be anything? – Chris Aug 22 '16 at 10:57
  • @Chris Yes. The ranges will never overlap (as that would lead to an ambiguous result). However, the range could be any span of values that can be specified by two longs. – Contango Aug 22 '16 at 11:05
  • 1
    If I understand correctly, you need that many keys (a range of Integers) link to one same item ? If that's correct I may have a solution but I'm not sure I understood correctly. – Martin Verjans Aug 22 '16 at 11:15
  • @SuperPeanut Yes, we need to effectively link lots of keys to the same item. For example, if the range was between 5 and 8, then keys 5, 6, 7 and 8 would all bring back the item. It gets more interesting if the range is 1 million to 100 million :) – Contango Aug 22 '16 at 12:18
  • 1
    Possible duplicate of [Data Structure to store Integer Range , Query the ranges and modify the ranges](http://stackoverflow.com/questions/18948351/data-structure-to-store-integer-range-query-the-ranges-and-modify-the-ranges) – Jackson Aug 22 '16 at 12:22

2 Answers2

1

I don't believe there's a structure for it already. You could implement something like a RangedDictionary:

class RangedDictionary {

   private Dictionary<Range, int> _set = new Dictionary<Range, int>();

   public void Add(Range r, int key) {
      _set.Add(r, key);
   }

   public int Get(int key) {
      //find a range that includes that key and return _set[range]
   }
} 

struct Range {  
   public int Begin;
   public int End;
   //override GetHashCode() and Equals() methods so that you can index a Dictionary by Range
}

EDIT: changed to HashSet to Dictionary

suwik
  • 110
  • 7
  • 1
    I can't find any references to a `HashSet` in the msdn docs. Am I missing something or is this from a third party library or did you actually mean to use a dictionary here? – Chris Aug 22 '16 at 10:51
  • Thanks for the answer. However, this seems to require that you know the range when you want to retrieve an item - but we don't know the range at that point! – Contango Aug 22 '16 at 10:52
  • 1
    Also it is worth noting that the process to "find a range that includes that key and return _set[range]" is not going to be O(1) as the OP was requesting I think... – Chris Aug 22 '16 at 10:52
  • 1
    @Chris: yep, I meant a dictionary. Well, we wouldn't need to know the ranges when retrieving. We'd only need to come up with a way of retrieving a range(ranges?) based on the key. And yep - this could be more than O(1), but I cannot see any better solution here. (Unless you have a predefined pattern for ranges like 1-100, 101-200, etc..) – suwik Aug 22 '16 at 11:03
  • 1
    Found out there's a dedicated data structure (although probably not implemented in standard .NET apis) for doing exactly that. It's a segment tree. It finds a range for a value in logarithmic complexity (well, just as bin-search would do here). However the ranges can overlap in a segment tree. Can OP's restriction that the ranges cannot overlap help us reduce the complexity? Not sure about that. – suwik Aug 22 '16 at 11:30
1

Here is a solution you can try out. However it assumes some points :

  • No range overlaps
  • When you request for a number, it is effectively inside a range (no error check)

From what you said, this one is O(N), but you can make it O(log(N)) with little effort I think.

The idea is that a class will handle the range thing, it will basically convert any value given to it to its range's lower boundary. This way your Hashtable (here a Dictionary) contains the low boundaries as keys.

public class Range
{
    //We store all the ranges we have
    private static List<int> ranges = new List<int>();
    public int value { get; set; }

    public static void CreateRange(int RangeStart, int RangeStop)
    {
        ranges.Add(RangeStart);
        ranges.Sort();
    }

    public Range(int value)
    {
        int previous = ranges[0];
        //Here we will find the range and give it the low boundary
        //This is a very simple foreach loop but you can make it better
        foreach (int item in ranges)
        {
            if (item > value)
            {
                break;
            }
            previous = item;
        }
        this.value = previous;
    }

    public override int GetHashCode()
    {
        return value;
    }
}

Here is to test it.

class Program
{
    static void Main(string[] args)
    {
        Dictionary<int, int> myRangedDic = new Dictionary<int,int>();
        Range.CreateRange(10, 20);
        Range.CreateRange(50, 100);

        myRangedDic.Add(new Range(15).value, 1000);
        myRangedDic.Add(new Range(75).value, 5000);

        Console.WriteLine("searching for 16 : {0}", myRangedDic[new Range(16).value].ToString());
        Console.WriteLine("searching for 64 : {0}", myRangedDic[new Range(64).value].ToString());

        Console.ReadLine();
    }
}

I don't believe you really can go below O(Log(N)) because there is no way for you to know immediately in which range a number is, you must always compare it with a lower (or upper) bound.

If you had predetermined ranges, that would have been easier to do. i.e. if your ranges are every hundreds, it is really easy to find the correct range of any number by calculating it modulo 100, but here we can assume nothing, so we must check.

To go down to Log(N) with this solution, just replace the foreach with a loop that will look at the middle of the array, then split it in two every iteration...

Martin Verjans
  • 4,675
  • 1
  • 21
  • 48