I have a SQL CLR table valued function which accepts two string parameters for the purpose of comparing company names and returning a match score outcome.
This is the c# function that I'm using to determine the likelyhood of two strings matching:
This works great, however because of the simplicity of the code, comparing HN FELT 09 AS
to HN FELT 01 AS
giving a high percentage which is right but I want to reduce the outcome by 50% if the difference bwteen the strings is a digit or digits. How to achieve this with the below function?
public static decimal CompareText(string String1, string String2)
{
// some more string cleaning
String1 = String1.Replace(",", " ").Replace(".", " ").Replace("/", " ").Trim();
String1 = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(String1));
String1 = String1.Replace(" ", " |").Replace("| ", "").Replace("|", "");
String1 = WordFunctions.RemoveDuplicateWords(String1);
String2 = String2.Replace(",", " ").Replace(".", " ").Replace("/", " ").Trim();
String2 = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(String2));
String2 = String2.Replace(" ", " |").Replace("| ", "").Replace("|", "");
String2 = WordFunctions.RemoveDuplicateWords(String2);
string[] String1SeparateWords = String1.Split(' ');
string[] String2SeparateWords = String2.Split(' ');
int String1WordCount = 0;
int String2WordCount = 0;
decimal theResult = 0;
String1WordCount = String1SeparateWords.Length;
String2WordCount = String2SeparateWords.Length;
int SameWordCount = 0;
foreach (string String1Word in String1SeparateWords)
{
if (String2SeparateWords.Contains(String1Word)) { SameWordCount++; }
}
if (String1WordCount > String2WordCount) { theResult = (decimal)SameWordCount / String1WordCount; }
else if (String2WordCount > String1WordCount) { theResult = (decimal)SameWordCount / String2WordCount; }
else if (String1WordCount == String2WordCount) { theResult = (decimal)SameWordCount / String1WordCount; }
else { theResult = 0; }
return (theResult * 100);
}
This is the part that compares the words (simple but works):
int SameWordCount = 0;
foreach (string String1Word in String1SeparateWords)
{
if (String2SeparateWords.Contains(String1Word)) { SameWordCount++; }
}
I'm not able to work out how to check for mismatches in digits