0

I am working on a small project but have run into a performance roadblock.

I have a Dictionary<string, string>()

I have a string[].

Lets say my Dictionary has 50,000 entries, and my string[] has 30,000 entries.

I want to collect the Keys from my Dictionary where the value.ToCharArray().OrderBy(x => x) equals a value.ToCharArray().OrderBy(x => x) of my string[].

I have tried reducing the number of KeyValue pairs I have to look through by comparing the length of my string[] value to the values in the Dictionary, but that has not really gained me any performance.

Does anyone have an ideas how I can improve the performance of this lookup?

Thanks!

To expand the pseudocode:

var stringToLookUp = GetSomeStrings(s.ToString()).Select(x => x).OrderBy(x => x).ToArray();
var aDictionaryOfStringString = GetDictionary(Resources.stringList);

var results = new List<string>();

foreach (var theString in stringToLookUp.Where(aString=> aString.Length > 0))
{
    if (theString.Length > 0)
    {
        var theStringClosure = theString;

        var filteredKeyValuePairs = aDictionaryOfStringString.Where(w => w.Value.Length == theStringClosure.Length && !results.Contains(w.Key)).ToArray();
        var foundStrings = filteredKeyValuePairs.Where(kv => kv.Value.ToCharArray().OrderBy(c => c).ToArray().SequenceEqual(theStringClosure))
                .Select(kv => kv.Key)
                .ToArray();
        if (foundStrings.Any()) results.AddRange(foundStrings);
    }
}
Rawle
  • 199
  • 12
  • What exactly is the criteria for the selection? Where the values matcyh, or where the characters in any order match? So that `foo` will match `oof` and `ofo`? – Adriaan Stander Mar 08 '15 at 18:35
  • 4
    `value.ToCharArray().OrderBy(x => x)` can be simplified to `value.OrderBy(x => x)` and save you 50k+30k array allocations; and pseudocode is not enough to say more – ASh Mar 08 '15 at 18:35
  • The order must match exactly. – Rawle Mar 08 '15 at 18:36

2 Answers2

2

I think principal problem is you iterate over whole dictionary in every single iteration - this is O(N^2). Better build hashset based on your modified key (either from dictionary or from array) and iterate over the second. This is O(N).

// some values
var dictionary = new Dictionary<string, string>();
var fields = new string[]{};


string[] modifiedFields = new string[fields.Length];
for(var i =0; i < fields.Length; i++)
{
  modifiedFields[i] = new string(fields[i].ToCharArray().OrderBy(x =>x).ToArray());
}
var set = new HashSet<string>(modifiedFields);
var results = new List<string>();
foreach(var pair in dictionary)
{
  string key = new string(pair.Value.ToCharArray().OrderBy(x =>x).ToArray());
  if (set.Contains(key))
  {
    results.Add(pair.Key);
  }
}
Ondrej Svejdar
  • 21,349
  • 5
  • 54
  • 89
  • This has worked for me, and your explanation is crystal clear. Thank you @ondrej – Rawle Mar 08 '15 at 19:08
  • An additional resource for anyone reading this solution: https://justin.abrah.ms/computer-science/big-o-notation-explained.html – Rawle Mar 10 '15 at 03:10
0

You can try this

var stringToLookUp = GetSomeStrings(s.ToString()).Select(x => x).OrderBy(x => x).ToArray();
var aDictionaryOfStringString = GetDictionary(Resources.stringList);

var results = aDictionaryOfStringString.Where(kvp => stringToLookUp.Select(s => s.OrderBy(x => x)).Contains(kvp.Value.OrderBy(x => x))).Select(kvp => kvp.Key).ToList();
Guy
  • 1,434
  • 1
  • 19
  • 33
  • While the code is correct, I have found this to be as slow as my original solution. I believe Ondrej's explanation above explains why this is slow. I can also guess that differed execution may have something to do with the slowness. – Rawle Mar 08 '15 at 19:15