0

I have a SQL CLR table valued function which accepts two string parameters for the purpose of comparing company names and returning a match score outcome.

This is the c# function that I'm using to determine the likelyhood of two strings matching:

This works great, however because of the simplicity of the code, comparing HN FELT 09 AS to HN FELT 01 AS giving a high percentage which is right but I want to reduce the outcome by 50% if the difference bwteen the strings is a digit or digits. How to achieve this with the below function?

public static decimal CompareText(string String1, string String2)
{
    // some more string cleaning
    String1 = String1.Replace(",", " ").Replace(".", " ").Replace("/", " ").Trim();
    String1 = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(String1));
    String1 = String1.Replace("  ", " |").Replace("| ", "").Replace("|", "");
    String1 = WordFunctions.RemoveDuplicateWords(String1);

    String2 = String2.Replace(",", " ").Replace(".", " ").Replace("/", " ").Trim();
    String2 = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(String2));
    String2 = String2.Replace("  ", " |").Replace("| ", "").Replace("|", "");
    String2 = WordFunctions.RemoveDuplicateWords(String2);

    string[] String1SeparateWords = String1.Split(' ');
    string[] String2SeparateWords = String2.Split(' ');

    int String1WordCount = 0;
    int String2WordCount = 0;
    decimal theResult = 0;

    String1WordCount = String1SeparateWords.Length;
    String2WordCount = String2SeparateWords.Length;

    int SameWordCount = 0;

    foreach (string String1Word in String1SeparateWords)
    {
        if (String2SeparateWords.Contains(String1Word)) { SameWordCount++; }
    }

    if (String1WordCount > String2WordCount) { theResult = (decimal)SameWordCount / String1WordCount; }
    else if (String2WordCount > String1WordCount) { theResult = (decimal)SameWordCount / String2WordCount; }
    else if (String1WordCount == String2WordCount) { theResult = (decimal)SameWordCount / String1WordCount; }
    else { theResult = 0; }

    return (theResult * 100);
}

This is the part that compares the words (simple but works):

int SameWordCount = 0;

    foreach (string String1Word in String1SeparateWords)
    {
        if (String2SeparateWords.Contains(String1Word)) { SameWordCount++; }
    }

I'm not able to work out how to check for mismatches in digits

Pondlife
  • 15,992
  • 6
  • 37
  • 51
Abu Dina
  • 207
  • 1
  • 5
  • 12

0 Answers0