0

I am trying to capture from two different pattern sequences using a named capture group. This SO question solves the problem in PCRE using the mode modifier (?J), and this SO question solves a related problem in Python that I haven't succeeded at applying to my use case.

Example test strings:

abc-CAPTUREME-xyz-abcdef
abc-xyz-CAPTUREME-abcdef

Desired output:

CAPTUREME
CAPTUREME

CAPTUREME appears on either the left or right of the xyz sequence. My initial failed attempt at a regex looked like this:

r'abc-(xyz-(?P<cap>\w+)|(?P<cap>\w+)-xyz)-abcdef'

But in Python regexes that yields an error (?P<cap> A subpattern name must be unique) and python doesn't support the (?J) modifier that was used in the first answer above to solve the problem.

With a single capture group I can capture CAPTUREME-xyz or xyz-CAPTUREME, but I can't reproduce the example in the 2nd stack overflow article linked above using lookarounds. Every attempt to replicate the 2nd stack overflow article simply doesn't match my string and there are too many differences for me to piece together what's happening.

r'abc-(?P<cap>(xyz-)\w+|\w+(-xyz))-abcdef'

https://regex101.com/r/NeWrDe/1

David Parks
  • 30,789
  • 47
  • 185
  • 328

1 Answers1

1

Looking at the second article, you could write the pattern as:

(?P<cap>(?<=abc-xyz-)\w+|\w+(?=-xyz-abcdef))

Explanation

  • (?P<cap> Named group cap
    • (?<=abc-xyz-)\w+ Match 1+ word characters, asserting abc-xyz- to the left
    • | Or
    • \w+(?=-xyz-abcdef) Match 1+ word characters, asserting -xyz-abcdef to the right
  • ) Close group cap

Regex demo


Another option in Python could be using a conditional and a capture group:

abc-(xyz-)?(?P<cap>\w+)-(?(1)|xyz-)abcdef

Explanation

  • abc-(xyz-)? Match abc- and optionally capture xyz- in group 1
  • (?P<cap>\w+) Named group cap, match 1+ word characters
  • - Match literally
  • (?(1)|xyz-) If group 1 is not present, match xyz-
  • abcdef Match literally

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    The second solution works and is the most general purpose, the first solution requires that the rest of the match string be trivial text which may not be the case in practice. Thanks for this! The conditional capture group was quite new to me, that's a nice trick, and not one I came across when searching for this, so hopefully it gets found by others. – David Parks Mar 04 '23 at 01:42