4

I want to store objects indexed by a 3-tuple of (String, String , DateTime). Lets call these Identifier,Category,Day

Any object in the data structure is guaranteed to be unique by the 3-tuple (no duplicates)

The data structure should support fast answers to questions such as:
- What are all the unique identifiers?
- What are the categories for identiifer "xyz"?
- What are the days where identifier = "xyz" and category is "mycategory"?

Removal is also possible. Would be great to maintain a low memory profile.

As a baseline, I'm using Dictionary<string , Dictionary<string , Dictionary<DateTime , object>>>

Theoretically this should give me O(1) retrieval, but I'm not familiar with the internals of Dictionary and generally I have a feeling that my solution is sub-optimal.

I know there's probably no one right answer here and I could provide numerous usage details, but perhaps someone can just give me a few ideas to play with?

Edit
The only retrieval performed is with equality (i.e. identiifer = "xyz"). I don't use inequalities (greater-than, less-than, etc.)

Suraj
  • 35,905
  • 47
  • 139
  • 250
  • Can Identifiers be arbitrary strings or is there a constraint on them (always a given length or less, etc)? – Roman Apr 11 '11 at 22:08
  • As implimented there is no constraint, but in practice identifiers will never be larger than 10 characters (and if needed I could constrain to that) – Suraj Apr 11 '11 at 22:13

2 Answers2

1

It depends on the relative numbers of values in each column, their distribution, and the distribution of queries, so there's no best answer.

Your dictionaries are fine for retrieval along one dimension, but you will have to linearly search if you want a combination of features.

If space weren't a problem, you could have a 3 level index (either trees or hash tables) so that you first retrieve items along 1 dimension, then use a dictionary at that node to find all items along the second dimension with the value for dimension 1, then use a dictionary at that node to find all items with all 3 values.

It also matters if you want to answer queries using inequalities. In this case, a tree is better than a dictionary because it is ordered.

Larry Watanabe
  • 10,126
  • 9
  • 43
  • 46
  • Probably a good choice, that's what I would have done (in the absence of further information). An example where you *might* want to have 3 nested indices is if the fields themselves have a nested or hierarchical structure, like "country", "city", "street". – Larry Watanabe Apr 12 '11 at 21:42
0

since you added the tag .NET 4.0 I assume you are in .NET 4.0 so why not a Dictionary<Tuple<T1,T2,T3>,object>

Brad Cunningham
  • 6,402
  • 1
  • 32
  • 39
  • What would that give him over the current nested dictionary structure? – Roman Apr 11 '11 at 22:09
  • 1
    It would be very slow to query along each dimension. – Suraj Apr 11 '11 at 22:15
  • speed wise probably nothing, but usability wise I would think the indexing code would be a bit cleaner. You could write an extension method to sort and compare items without having to "dot down" the nest object graph – Brad Cunningham Apr 11 '11 at 22:15
  • Well, the dictionaries (at the moment) are just an implementation detail, I assume the external API of this class doesn't expose any of that information, so the only person "dotting down" is him. – Roman Apr 11 '11 at 22:40