2

I have a list of strings, which I want to find the start and end indices of their occurrence in a given string.

I want to find the longest common substring that is present in the original string and only print it.

Here is my code:

public static void Main(string[] args)
{
    //This is my original string where I want to find the occurance of the longest common substring
    string str = "will you consider the lic premium of my in-laws for tax exemption";

    //Here are the substrings which I want to compare   
    List<string> subStringsToCompare = new List<string>
    {
        "Life Insurance Premium",
        "lic",
        "life insurance",
        "life insurance policy",
        "lic premium",
        "insurance premium",
        "insurance premium",
        "premium"
    };

    foreach(var item in subStringsToCompare)
    {
        int start = str.IndexOf(item);

        if(start != -1)
        {
            Console.WriteLine("Match found: '{0}' at {1} till {2} character position", item, start, start + item.Length);
        }
    }
}

The problem is I am getting 3 occurrences instead of one. I can't seem to figure out the condition where it gets the longest common matched substring from all substring to compare.

Output I am getting:

  • Match found: 'lic' at 22 till 25 character position
  • Match found: 'lic premium' at 22 till 33 character position
  • Match found: 'premium' at 26 till 33 character position

Output expected:

  • Match found: 'lic premium' at 22 till 33 character position

.NET Fiddle

Kunal Mukherjee
  • 5,775
  • 3
  • 25
  • 53

3 Answers3

3

Here's what I suggested in my comment

public static void Main(string[] args)
{
    //This is my original string where I want to find the occurance of the longest common substring
    string str = "will you consider the lic premium of my in-laws for tax exemption";

    // Here are the substrings which I want to compare
    // (Sorted by length descending) 
    List<string> subStringsToCompare = new List<string>
    {
        "Life Insurance Premium",
        "life insurance policy",
        "insurance premium",
        "life insurance",
        "lic premium",
        "premium",
        "lic"
    };

    foreach(var item in subStringsToCompare)
    {
        int start = str.IndexOf(item);

        if(start != -1)
        {
            Console.WriteLine("Match found: '{0}' at {1} till {2} character position", item, start, start + item.Length);

            break; // Stop at the first match found
        }
    }
}
vc 74
  • 37,131
  • 7
  • 73
  • 89
2

if you only needs exact match from the list of strings (not substring of the strings within the list) then you are pretty close

string longest = null;
int longestStart = 0;
foreach(var item in subStringsToCompare)
{
    int start = str.IndexOf(item);

    if(start != -1 && (longest == null || item.Length > longest.Length))
    {
        longest = item;
        longestStart = start
    }
}

if (longest != null)
{
    Console.WriteLine("Match found: '{0}' at {1} till {2} character position", longest, longestStart, longestStart + longest.Length);
}
Steve
  • 11,696
  • 7
  • 43
  • 81
0

I haven't tried but try the following:

List<string> matches = new List<string>();
for( int i = 0; i < str.Length; i++ )
{
    foreach ( string toCompare in subStringsToCompare )
    {
        if ( str.SubString( i, toCompare.Length ) == toCompare )
            matches.Add( toCompare );
    }
}

string longest = "";
foreach ( string match in matches )
{
    if ( match.Length > longest.Length )
        longest = match;
}
  • This would further increase the complexity resulting in unnecessary computations. – Kunal Mukherjee Jun 21 '18 at 19:22
  • Could you perhaps explain why you think your changes are better? As it is we are left guessing why you think this is an improvement. For example I can't see what advantage there is in going character by character through the string, finding a substring and comparing them rather than just using .indexof as the OP did originally. – Chris Jun 22 '18 at 10:23
  • "Changes"? My answer was no change of some code there is not existing... – Niklas Fehde Jun 22 '18 at 12:51