1

I'm modifying a regular expression to extract a group of group matches, but this 'supergroup' does not return the composite matched string as expected.

The string to match is of the form:

/DIR/SOMESTRING-W0.12+345.raw.gz

and the regex I'm using:

/DIR/
(?P<super>
    (?P<name>.*?)
    (?=(?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$)
)

I'm getting the following results for the named groups:

modifier: '-W0.12+345'
super: 'SOMESTRING'
name: 'SOMESTRING'
extension: 'raw.gz'

while I was expecting

super: 'SOMESTRING-W0.12+345.raw.gz'

The grouping of subgroups has always worked for me, but not this time, and I cannot understand why.

Hope someone could give me some hint.

NOTE: The explanation of this regex can be found in (matching a specific substring with regular expressions using awk)

Community
  • 1
  • 1
RogerFC
  • 329
  • 3
  • 15

1 Answers1

2

The group super matches the same text that the group name matches, because the lookahead assertion doesn't contribute any actual characters to the match (that's why they're also called "zero-width assertions").

To get the desired result, just remove the lookahead assertion:

/DIR/
(?P<super>
    (?P<name>.*?)
    (?P<modifier>-W\d\.\d{2}[+-]\d{3})?\.(?P<extension>raw\.gz|root)$
)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Wow, so easy as that! I was under the impression that the lookahead was absolutely necessary. Thanks! – RogerFC Apr 09 '13 at 08:05