5

I'd like to have a

data structure with <Key, Value> where
  Key = (start, end), 
  Value = string

After which I should be able to search an integer optimally in the data structure and get corresponding value.

Example:

var lookup = new Something<(int, int), string>()
{
  {(1,100),"In 100s"},
  {(101,200),"In 100-200"},
}

var value1 = lookup[10]; //value1 = "In 100s"
var value2 = lookup[110]; //value2 = "In 100-200"

Could anyone suggest?

  • You need to give some example data, we need a [mcve]. – DavidG Feb 12 '19 at 17:25
  • 1
    And please try to make it so that the title and description match up a little better. The title makes it sound like you are looking for something like a `Dictionary, SomeType>`, but your question makes it look like the key to the dictionary is some type of 2-tuple (though I can't tell what types `start` and `end` (in your tuple) are) – Flydog57 Feb 12 '19 at 17:27
  • Hi @Peeush Agarwal, Welcome to Stack Overflow, Have you try to implement one algorithm in particular? Have you investigate options? Please feel free to share any examples. – abestrad Feb 12 '19 at 17:28
  • Do the `(start, end)` intervals overlap? If they do you could look at [interval trees](https://en.wikipedia.org/wiki/Interval_tree), otherwise you could use a `SortedDictionary` keyed on the start. – Lee Feb 12 '19 at 17:31
  • What is a `Something` here? Do you basically have a dictionary? – DavidG Feb 12 '19 at 17:42
  • @Flydog57 Edited my question. Kindly suggest your answer. – Peeush Agarwal Feb 12 '19 at 17:42
  • @DavidG Something is what I'd like to know which I can use here. – Peeush Agarwal Feb 12 '19 at 17:43
  • @Lee intervals do not overlap. I'll have a look at SortedDictionary. Thanks – Peeush Agarwal Feb 12 '19 at 17:45
  • Possible duplicate of [A dictionary object that uses ranges of values for keys](https://stackoverflow.com/questions/2147505/a-dictionary-object-that-uses-ranges-of-values-for-keys) – devNull Feb 12 '19 at 18:13
  • Roughly how many entries are in your collection: a handful, a couple of hundred, thousands, lots? Since they don't overlap (and I'm assuming they are sorted), you just need to walk the collection, looking the `start` value; once you find one that's too big, you are once step too far. But, that's O(N). I can't think of a way to beat O(N). It looks like @devNull's suggestion may be the best. – Flydog57 Feb 12 '19 at 18:43
  • If your intervals do not overlap, you can just store them in a binary search tree and then find a matching `start` in `O(log(N))`. – Yeldar Kurmangaliyev Feb 12 '19 at 20:17

2 Answers2

2

If you want to be able to use something like lookup[10] as you mentioned, you can create your own class that implements some sort of key/value data type. Which underlying data type you ultimately decide to use really depends on what your data looks like.

Here's a simple example of doing this while implementing a Dictionary<>:

public class RangeLookup : Dictionary<(int Min, int Max), string>
{
    public string this[int index] => this.Single(x => x.Key.Min <= index && index <= x.Key.Max).Value;
}

This allows you to define a custom indexer on top of the dictionary to encapsulate your range lookup. A usage of this class would look like:

var lookup = new RangeLookup
{
    { (1, 100), "In 100s" },
    { (101, 200), "In 101-200s" },
};

Console.WriteLine($"50: {lookup[50]}");

Which produces output as:

enter image description here


In terms of performance with this approach, the following is an example of some tests (using Win10 with an Intel i7-4770 CPU) retrieving a value from a dictionary with 10,000,000 records:

var lookup = new RangeLookup();

for (var i = 1; i <= 10000000; i++)
{
    var max = i * 100;
    var min = max - 99;
    lookup.Add((min, max), $"In {min}-{max}s");
}

var stopwatch = new Stopwatch();

stopwatch.Start();
Console.WriteLine($"50: {lookup[50]} (TimeToLookup: {stopwatch.ElapsedMilliseconds})");

stopwatch.Restart();
Console.WriteLine($"5,000: {lookup[5000]} (TimeToLookup: {stopwatch.ElapsedMilliseconds})");

stopwatch.Restart();
Console.WriteLine($"1,000,000,000: {lookup[1000000000]} (TimeToLookup: {stopwatch.ElapsedMilliseconds})");

Which gives the following results:

enter image description here

So unless you plan on working with more than tens of millions of records inside of this data set, an approach like this should be satisfactory in terms of performance.

devNull
  • 3,849
  • 1
  • 16
  • 16
  • While this works, I'm not a huge fan of the slight abuse of the indexer. It just feels a little janky to me. – DavidG Feb 12 '19 at 20:20
  • @devNull That looks as what I wanted. I can live with the Indexer as they make the code simple and abstract the search. Thanks for sharing the performance timings as well. – Peeush Agarwal Feb 13 '19 at 04:06
1

You basically have a Dictionary<> structure here, for example:

var lookup = new Dictionary<(int, int), string>()
{
  {(1,100),"In 100s"},
  {(101,200),"In 100-200"},
};

You can use some basic Linq queries to search that container, for example:

var searchValue = 10;
var value1 = lookup.First(l => l.Key.Item1 <= searchValue && l.Key.Item2 >= searchValue);

searchValue = 110;
var value2 = lookup.First(l => l.Key.Item1 <= searchValue && l.Key.Item2 >= searchValue);

But as Lee suggested in the comments, you might get better performance using a SortedDictionary, your mileage may vary, which means you need to test the performance of both.

DavidG
  • 113,891
  • 12
  • 217
  • 223