1

I am trying to do a regex search on 'NNTSY` so that I can get two matches.

  • NNTS
  • NTSY

When I attempted to match using the pattern ?<NGrlyosylation>N[^P][ST][^P])", I am only getting one match, which is NNTS.

How can I use Regex to match NNTSY so that two matches can be found?

NOTE: Background info: Rosalind problem can be found here.

Here is my code.

        input = "NNTSY";
        Regex regex = new Regex("(?<NGrlyosylation>N[^P][ST][^P])", RegexOptions.Compiled | RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(input);
        foreach (Match match in matches)
        {
            // Need to add 1 to because match index is 0 based
            const int offset = 1;
            yield return match.Index + offset;
        }
dance2die
  • 35,807
  • 39
  • 131
  • 194

1 Answers1

2

Finding overlapping matches is usually not allowed in most programming languages (except few). So, I don't think there exists a pure regex way to solve this, but you can use Substring in C# with lookahead as

(?=N[^P][ST][^P]).

C# Code

string input = "NNTSY";
Regex regex = new Regex("(?=N[^P][ST][^P]).", RegexOptions.Compiled | RegexOptions.IgnoreCase);

Match match = regex.Match(input);

while (match.Success)
{
    Console.WriteLine(input.Substring(match.Index, 4));
    match = match.NextMatch();
}

Ideone Demo

rock321987
  • 10,942
  • 1
  • 30
  • 43
  • Thanks @rock321987. I wasn't aware of Regex `lookahead` concept so I was lost. At least the tests I created passes all cases and I am ready to move on. – dance2die May 22 '16 at 04:34
  • @Sung glad, it helped..`lookaheads` are zero width-assertion..it means that they do not consume any character..the regex will be checked against the string without the matched string being consumed – rock321987 May 22 '16 at 04:38
  • @Sung `.` was not required in the regex..simple lookahead will suffice here..adding `.` will speed up the process a bit – rock321987 May 22 '16 at 04:41