Forward
The problem I have with this type of question:
- What happens if you have "pqr hello world merger was merger to be undertaken between merger xyz as the sole acquirer but got delayed". Each instance of
merger
is followed by at least one delayed
which is more than 4 words away, however each merger doesn't have it's own delayed.
It would be easier to find test for the bad things then use program logic to except or reject the result.
Description
This regex will match all of the strings which violate your conditions. If the regex doesn't match, then the string should be considered good.
- is there a
merger
which has a trailing merger
- does each
merger
have a corresponding instance of delayed
- does the
delayed
appear with the first 4 words after each merger
In addition to looking for the bad things the expression should do the following:
- handle multiple line strings correctly
- ensure
merger
and delayed
are not part of a larger word
(?:^|\s)merger(?:(?=([\s\r\n]+(?:(?!delayed\b)\w+[\r\n\s]+)*?(?:merger|$)(?:[\s\r\n]|$)))|(?=([\s\r\n]+(?:\w+[\r\n\s]+){0,4}delayed(?:[\s\r\n]|$))))

Example
Sample Text
Note the line break
pqr hello world merger was merger to be
delayed undertaken between merger xyz as the sole acquirer but got delayed
Code
using System;
using System.Text.RegularExpressions;
namespace myapp
{
class Class1
{
static void Main(string[] args)
{
String sourcestring = "source string to match with pattern";
Regex re = new Regex(@"(?:^|\s)merger(?:(?=([\s\r\n]+(?:(?!delayed\b)\w+[\r\n\s]+)*?(?:merger|$)(?:[\s\r\n]|$)))|(?=([\s\r\n]+(?:\w+[\r\n\s]+){0,4}delayed(?:[\s\r\n]|$))))",RegexOptions.IgnoreCase | RegexOptions.Singleline);
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}
}
}
}
Matches
Note these are the bad instances which breaks your defined rules. If there where no matches then it would be a good string. If capture group 1 is populated then the did not have a corresponding delayed
. If capture group 2 is populated then merger
had a delayed
within the first 4 words.
[0][0] = merger
[0][1] = was merger
[0][2] =
[1][0] = merger
[1][1] =
[1][2] = to be
delayed