0

Suppos that within a regex, if match one alternative from an alternation it stop right there even if still more alternatives left (there are no other tokens in the regex outside the alternation).

Source

This pattern that search one double word (e.g., this this)

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)

I have one confusion if I introduce this subject:

It match with the patern.

"<i>whatever<i>         whatever"

\b([a-z]+) Match

((?:<[^>]+>|\s)+) Follows one TAG, so the 2nd alternative match.

(\1\b) Have to match if follows the same word backreferenced in the first parentheses.

Why match if after the tag not follows the '(\1\b)', follows whitespaces.

I know that within the alternation exist \s.

But is not supposed that the TAG match consume the alternation?

Why the \s alternative still alive?

Unihedron
  • 10,902
  • 13
  • 62
  • 72
nEAnnam
  • 1,246
  • 2
  • 16
  • 22
  • It's unclear what you are trying to do. I'd recommend using this tool when developing regular expressions: http://gskinner.com/RegExr/ – Chris Laplante Jun 22 '11 at 01:58

2 Answers2

2

That + means "one or more of (?:\s|<[^>]+>)". Yes, the first of them consumes the tag, but there may be an infinite number of additional tags or whitespace before (\1\b) follows.

\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)
                         ^
AndreKR
  • 32,613
  • 18
  • 106
  • 168
  • Thanks for the help, but i can´t give 2 answered. and the page give me that u answered a few seconds after alan , so i give the answered to him, i voted up your answer. Thanks – nEAnnam Jun 22 '11 at 02:44
  • No problem, it's often done that way when two answers say the same in different words. – AndreKR Jun 22 '11 at 12:50
2

The alternation is controlled by a + quantifier:

(?:\s|<[^>]+>)+

...so it tries to match multiple times. Each time, it may try both alternatives: first \s, and if that fails, <[^>]+>.

The first time, \s fails to match, but <[^>]+> succeeds in matching <i>.

The second time, \s matches one space.

The third time, \s matches another space.

...and so on, until all the spaces are consumed.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156