0

I am trying to accept a capture group only if the pattern matches and there is not a specific word before the end of the group. I've tried a # of approaches and none seem to work, clearly I'm not getting the concept:

https://regex101.com/r/iP2xY0/3 https://regex101.com/r/iP2xY0/4

Regardless of what I do my capture group captures something and my goal is if the reject word exists in the middle of the pattern to return no match.

RC:\*.*?(?P<Capture>(Bob|David|Ted|Alice))(?!Reject).*
  • RC:* Hi Bob Smith<\person>
  • RC:* Hi David Jones *Notes Bla bla<\person>
  • RC:* Hi Ted Warren *Rejected <\person>

Capture Namegrouop is supposed to return:

  • Bob
  • David
  • ''

So "Reject" says if the NameGroup Capture is found followed by anything ending in < capture it, if between the NameGroup and the < the word Reject appears do not.

user3649739
  • 1,829
  • 2
  • 18
  • 28

1 Answers1

0

I would recommend putting your negative look-ahead at the beginning of your pattern. This first checks if your reject word exists in your string and only if it isn't there does it try to match the rest of the string:

(?!.*Rejected.*)RC:\*.*?(?P<Capture>(Bob|David|Ted|Alice)).*

https://regex101.com/r/iP2xY0/6

bunji
  • 5,063
  • 1
  • 17
  • 36
  • That in fact works, I guess thus "negative look-ahead" :). I accepted the answer because it is indeed correct, yet I run into issues the way I am using it with Timeouts. I have two 'pipes' the first says use the first pipe's capture group if a term does not appear and use the second pipe if it does: https://regex101.com/r/bU6cU6/1, using your solution for neg lookahead to qualify the first pipe. However the neg phrase is actually around 3k chars into the text and am getting timeouts https://regex101.com/r/bU6cU6/2 Is there a way around that or is it just a function of neg lookahead? – user3649739 Sep 15 '16 at 03:23
  • The strange thing in the timeout is it only takes 32 steps for regex to find out the first pipe does not work on it's own and 18 steps to find out the second pipe does work on it's own. So not sure what Regex is doing that would time it out. Shouldn't it a) check first pipe, 32 steps, reject, move on to second pipe, 18 steps accept? – user3649739 Sep 15 '16 at 03:52
  • 1
    Put the lookahead after `RC:` to reduce unnecessary backtracking. – Wiktor Stribiżew Sep 16 '16 at 05:21