1

I am looking to create several sub-expressions in a larger regular expression, where each subexpression matches something at one place in the input or another place, but not in both places, preferably using the same named group per "area of interest". For example, I'd like to match volume units in italics below, and currency units, shown in bold.

  • $3.23 USD / gal.
  • USD 3.23 in gallons
  • 4.50 CAD / gal
  • 1 gal @ USD 3.23
  • 10 gal. @ $4.50 CAD

Or more generally:

  • stuffmorestuffXXXyetmorestuff
  • stuffXXXmorestuff

where stuff and morestuff could be a complex set of sub-expressions.

It seems like it might be possible using some combination of

  • group stack push/pop
  • balancing groups
  • look-around

but I'm not sure how to proceed. Does it come down to alternations (|) or multiple passes with different expressions (which I suppose amounts to the same thing)?

Kit
  • 20,354
  • 4
  • 60
  • 103
  • 1
    You'll need better examples. In the three that you show, your regular expression is simply `"XXX"`. Or, I suppose `"USD|CAD"`. Give a few examples where that won't do what you want. – Jim Mischel Dec 02 '11 at 22:53
  • @Kit you will probably have to take into account all possible conditions for this to work. – FailedDev Dec 02 '11 at 22:53
  • In your first example, there doesn't seem to be any need for the regex to specify different units: if they come out the same then the conversion is easy. – Matthew Strawbridge Dec 02 '11 at 22:54
  • The units are only a part of it. There are many other named captures, which seem to prevent a single capture in multiple positions. – Kit Dec 05 '11 at 18:49
  • @JimMischel - I added two more examples and added some detail explaining I'm really looking to do this multiple times (i.e. multiple named groups in different places). – Kit Dec 05 '11 at 20:39

1 Answers1

1

You probably have to use alternation, something like this?

^(stuffmorestuff)XXX(yetmorestuff)|(stuff)XXX(morestuff)$

But you will end up with four capture groups. Not sure how the .NET regex engine will behave if you use the same group name for several groups.

Mattias Wadman
  • 11,172
  • 2
  • 42
  • 57