3

I need to do some look-up operations against a collection of items.

First I need to see if there is a direct match. This is quite simple as I have the entries in a Dictionary<String,MyObjectType>, so I can just go dictionary["valuetofind"].

If however there is no direct match, then I need to do a starts-with match, but it has to be the longest match that is returned:

Record Examples:

String   Record
0        A
01       B
012      D
02       B
03       C

Query examples:

Query         Result 
0             A    - Because 0   is the longest match
01            B    - Because 01  is the longest match
023456        B    - Because 02  is the longest match
012           D    - Because 012 is the longest match
0123456       D    - Because 012 is the longest match
03456         C    - Because 03  is the longest match
04            A    - Because 0   is the longest match
0456          A    - Because 0   is the longest match
1             Null - No Match

Are there classes in the framework that have hashes or tree structures in the background implementation for doing something like this, or am I needing to write something myself?

EDIT What I have so far is the list sorted by length of the pattern string and then I go over the entries one by one to see if the query starts with the record. This works OK for most of the situations as we do not have large lists (yet), but does have an expensive cost for the situations where there is no match.

I lack the vocabulary to get google to give me pages not relating to hash-sets, lists and dictionaries. All the research I found points at tree based structures, but none point out if there is already an implementation in the .NET Framework or not.

My Other Me
  • 5,007
  • 6
  • 41
  • 48
  • 1
    take a look at http://stackoverflow.com/questions/2765786/quickly-or-concisely-determine-the-longest-string-per-column-in-a-row-based-data and http://stackoverflow.com/questions/3760639/any-framework-functions-helping-to-find-the-longest-common-starting-substring-of – Glory Raj Nov 30 '11 at 13:11
  • 1
    The dictionary approaches below is likely `O(n^2 logn)`. A trie would probably work and would only be `O(n logn)`. – leppie Nov 30 '11 at 13:22
  • A Trie-like structure would be the quickest way of solving this in the case that you have a very large set to search. http://en.wikipedia.org/wiki/Trie – spender Nov 30 '11 at 13:23
  • 1
    @leppie: Where do the log terms come from in your order approximation? A well-built trie can be searched for a string of length m in O(m) time; the number of nodes in the trie is not a factor. – Eric Lippert Nov 30 '11 at 17:10
  • @EricLippert: You are correct. Not sure what I was thinking ;p I know the first one was just a thumb-sucked guestimate (incorrectly based on 'contains' instead of 'startswith'). – leppie Nov 30 '11 at 17:56

4 Answers4

8

Leppie and Spender are correct; the data structure you want to implement to solve this problem efficiently if the data set becomes large is a "trie", or, if you're really buff, a DAWG -- a directed acyclic word graph. A DAWG has better memory performance if the strings have many common suffixes but they are more expensive and difficult to build and update, so start with a trie.

Your simple case would make a trie that looks like this:

           ROOT
            |
           0|
            |
            A
          / | \
         /  |  \
       1/  2|  3\
       /    |    \
      /     |     \
     B      B      C
     |
    2|
     |
     D

So to look up 023456, you start at the root, go down branch labelled 0 to find A, then go down branch 2 to find B, there is no branch 3 at that point, so you're done.

Incidentally, this is also the data structure you'd use to find the longest Scrabble word given a dictionary and a set of letters; it's essentially the same problem.

There's no trie data structure built into the .NET framework, but it is not a difficult data structure to build. I've got an immutable trie lying around here somewhere that I've been meaning to blog about; if I ever do, I'll post a link here.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • We make extensive use of tries (and trie-like graphs) for a number of purposes, but my favorite is super fast (and non-cpu intensive) autocomplete for our website over a large set of items. It's costly in terms of memory usage, but it makes searching instantaneous. In my mind, a much underrated data-structure. Would be great to see you post something about it on your blog. – spender Dec 01 '11 at 00:54
  • I just I've actually implemented a trie once in JavaScript for the exact same thing as spender; the server returns the data as an array of items that look like like `{'e': {'x': {'a': {'m': {'p': {'l': {'e': {'value': 'Example'}}}}}}}}` and we use `jQuery.extend` to build a trie from it with one method call. – configurator Dec 03 '11 at 11:14
1

a rather simple way is to brute force them. i assume that you have a Dictionary<string, string> _lookupTable that holds your lookups

string Find(string query)
{
    var retval = null;
    while(!string.IsNullOrEmpty(query) && retval == null)
    {
        if(!_lookupTable.TryGetValue(query, out retval))
            query = query.Substring(0, query.Length-1);
    }
    return retval;
}
esskar
  • 10,638
  • 3
  • 36
  • 57
0

You could just scan the whole Dictionary for the longest match.

        string sQuery = "01234";

        int iMaxLength = 0;
        foreach (KeyValuePair<String, String> kVP in mD)
        {
            if (sQuery.Contains(kVP.Value) && (kVP.Value.Length > iMaxLength))
            {
                iMaxLength = kVP.Value.Length
                result = (whatever...)
            }
        }
Shai
  • 25,159
  • 9
  • 44
  • 67
0

By the looks of it, you should use a binary tree which is simply sorted on length, then look for the first match. I don't think something like a binary tree is already implemented in c#, but a quick search reveals many sites where people have done so.

hcb
  • 8,147
  • 1
  • 18
  • 17