-4

I have a working regex:

~<([A-Za-z_\-]+)[^>]*>(*SKIP)(*F)|([A-Za-z0-9<>:\[\]\s]|^|\n)text([A-Za-z0-9<>:\[\]\s]|$|\n)~sig

Now, I want to match wanna_match_this in <blockquote>wanna_match_this</blockquote> and I am trying to fix this:

<([A-Za-z_\-]+)[^>]*>(*SKIP)(*F)|([A-Za-z0-9<>:\[\]\s]|^|\n)wanna_match_this([A-Za-z0-9<>:\[\]\s]|$|\n)

It should match wanna_match_this in <blockquote>wanna_match_this</blockquote>, but it does not.

This one matches but I need the other one that is more precise.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Wellenbrecher
  • 179
  • 1
  • 9

1 Answers1

0

The problem is due to the fact the first alternative (with SKIP-FAIL) consumes any <TAG> and proceeds searching for a match right after the end of the unsuccessfully matched text. The wanna_match_this text starts right after that text and [A-Za-z0-9<>:\[\]\s] consumes the w char preventing this match from occurring.

So, you need a zero-width asertion, a positive lookbehind here, best coupled with a positive lookahead:

<([A-Za-z_-]+)[^>]*>(*SKIP)(*F)|(?<=[A-Za-z0-9<>:\[\]\s]|^)wanna_match_this(?=[A-Za-z0-9<>:\[\]\s]|$)

See demo

Notice I removed \n as it is already covered with \s.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563