Have words from OCR and need a list of close matches. Can live without the maxFrom. The sample code is brute force but hopefully it defines the requirement. Against of list of 600,000 this takes 2 seconds. FTSword.Word is a string.
Ideally "findd" would only give additional credit to a second d. And once it finds an i then f gets no credit. Brute force I can do that. I am looking to take that 2 seconds down. Will test and report any solution proposed.
The question?? is. How to make it faster? (and smarter)
Thanks
char[] find = new char[] { 'f', 'i', 'n', 'd' };
char[] word;
int maxFrom = 10;
int minMatch = 3;
int count;
List<FTSword> matchWords = new List<FTSword>();
foreach (FTSword ftsw in fTSwords)
{
if (ftsw.Word.Length < maxFrom)
{
word = ftsw.Word.ToCharArray();
count = 0;
foreach (char fc in find)
{
foreach (char wc in word)
{
if (char.ToLower(wc) == char.ToLower(fc))
{
count++;
break;
}
}
}
if (count >= minMatch)
{
// Debug.WriteLine(count.ToString() + ftsw.Word);
matchWords.Add(ftsw);
}
}
}
Debug.WriteLine(matchWords.Count.ToString());