-1

I have a string

"pqr hello world merger was to be undertaken between xyz as the sole acquirer but got delayed"

I want to make sure that

"delayed"

always comes after 5 words or more following

"merger"

.

How can this be achieved using regex and C#?


Solved

Got the answer after reading this: http://www.princeton.edu/~mlovett/reference/Regular-Expressions.pdf

Solution:

Regex.IsMatch(articlecontent.ToLower().Trim(), @"\bmerger\W+(?:\w+\W+){5," + count_of_words_in_article + @"}?\bdelayed", RegexOptions.Multiline)

The idea is to find "merger" NEAR "delayed" at any distance after 5 words.

Community
  • 1
  • 1
Anirudh C
  • 39
  • 5
  • exactly 5 words or more that 5 words – Anirudha Jul 03 '13 at 11:42
  • possible duplicate of [Regex to match a word NOT within a specific number of words of another word](http://stackoverflow.com/questions/4025233/regex-to-match-a-word-not-within-a-specific-number-of-words-of-another-word) – Ro Yo Mi Jul 04 '13 at 16:14

3 Answers3

1

You can use lookbehind

(?<=merger(\s+\w+){5}\s+)delayed
Anirudha
  • 32,393
  • 7
  • 68
  • 89
1

Forward

The problem I have with this type of question:

  • What happens if you have "pqr hello world merger was merger to be undertaken between merger xyz as the sole acquirer but got delayed". Each instance of merger is followed by at least one delayed which is more than 4 words away, however each merger doesn't have it's own delayed.

It would be easier to find test for the bad things then use program logic to except or reject the result.

Description

This regex will match all of the strings which violate your conditions. If the regex doesn't match, then the string should be considered good.

  • is there a merger which has a trailing merger
  • does each merger have a corresponding instance of delayed
  • does the delayed appear with the first 4 words after each merger

In addition to looking for the bad things the expression should do the following:

  • handle multiple line strings correctly
  • ensure merger and delayed are not part of a larger word

(?:^|\s)merger(?:(?=([\s\r\n]+(?:(?!delayed\b)\w+[\r\n\s]+)*?(?:merger|$)(?:[\s\r\n]|$)))|(?=([\s\r\n]+(?:\w+[\r\n\s]+){0,4}delayed(?:[\s\r\n]|$))))

enter image description here

Example

Sample Text

Note the line break

pqr hello world merger was merger to be 
delayed undertaken between merger xyz as the sole acquirer but got delayed

Code

using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@"(?:^|\s)merger(?:(?=([\s\r\n]+(?:(?!delayed\b)\w+[\r\n\s]+)*?(?:merger|$)(?:[\s\r\n]|$)))|(?=([\s\r\n]+(?:\w+[\r\n\s]+){0,4}delayed(?:[\s\r\n]|$))))",RegexOptions.IgnoreCase | RegexOptions.Singleline);
          MatchCollection mc = re.Matches(sourcestring);
          int mIdx=0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
              }
            mIdx++;
          }
        }
    }
}

Matches

Note these are the bad instances which breaks your defined rules. If there where no matches then it would be a good string. If capture group 1 is populated then the did not have a corresponding delayed. If capture group 2 is populated then merger had a delayed within the first 4 words.

[0][0] =  merger
[0][1] =  was merger 
[0][2] = 

[1][0] =  merger
[1][1] = 
[1][2] =  to be 
delayed 
Community
  • 1
  • 1
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • Regex.IsMatch(articlecontent.ToLower().Trim(), @"\bmerger\W+(?:\w+\W+){5," + count_of_words_in_article + @"}?\bdelayed", RegexOptions.Multiline – Anirudh C Jul 04 '13 at 11:23
  • Technically that finds the word delayed when it is near the word merger, however that wasn't your original question. Which said you needed to insure the word delayed appeared 5 or more words away from merger. – Ro Yo Mi Jul 04 '13 at 15:08
  • 1
    I guess 5 or more words away would be same as NEAR if it yields the same result.I guess it works for me in that way. Sorry, if my explanation to the question was unclear. – Anirudh C Jul 05 '13 at 12:14
0

Try this...

/merger\s+\w+\s+\w+\s+\w+\s+\w+\s+delayed/
Krishna
  • 381
  • 2
  • 9