7

I'm writing a Perl regex to match both the strings x bla and [x] bla. One alternative is /(?:x|\[x\]) bla/. This isn't desirable, because in the real world, x is more complicated, so I want to avoid repeating it.

The best solution so far is putting x in a variable and pre-compiling the regex:

my $x = 'x';
my $re = qr/(?:$x|\[$x\]) bla/o;

Is there a neater solution? In this case, readability is more important than performance.

Tim
  • 13,904
  • 10
  • 69
  • 101
  • 1
    Incidentally, `qr//o` doesn't mean anything :) – hobbs Jun 26 '11 at 17:49
  • @hobbs: I just read through perlop again, and you're right (except if `$x` would change). Thanks! – Tim Jun 26 '11 at 17:53
  • I think your solution is fine (you might compile `$x=qr/x/` as I now see robert already mentioned), also, while I can't find it right now I have read several times that `/o` isn't needed anymore. I would ask our resident guru @tchrist for proof of that. – Joel Berger Jun 26 '11 at 18:09

5 Answers5

9

It's possible, but not all that clean. You can use the fact that conditional subpatterns support tests such as (?(N)) to check that the Nth capturing subpattern successfully matched. So you can use an expression such as /(\[)?X(?(1)\])/ to match '[X]' or 'X'.

jaytea
  • 1,861
  • 1
  • 14
  • 19
  • I was looking for this, but couldn't quite dig up the right syntax. Yours beats mine :) – hobbs Jun 26 '11 at 17:57
  • Nice idea. I think this is the neatest code I'm going to get :) – Tim Jun 26 '11 at 18:02
  • 2
    There's also a trick you can use for simpler regex flavours that don't support conditional subpatterns: `/(\[())?X(\2\])?/`. With this, an empty string is matched as \2 if the open bracket is present, which you can check for later. – jaytea Jun 26 '11 at 19:53
  • Neat. If only it worked in all RE engines. (Some bind a subexpression to the empty string by default.) – Donal Fellows Jun 26 '11 at 20:07
  • The ECMAScript/JavaScript equivalent is `/(\[?)x(?:(?!.*$\1)\]|(?=.*$\1))/` – Deadcode Feb 20 '21 at 16:29
1

You can pre-compile $x as well. This also makes errors a little more obvious if $x is really ?(+[*{ or something else that a regex compiler will completely freak out on.

my $x = qr/x/;
my $re = qr/(?:$x|\[$x\]) bla/o;
robert
  • 33,242
  • 8
  • 53
  • 74
1

There isn't a neater solution really, because this is where we leave the domain of regular languages and start requiring a more complex automaton with some kind of memory. (Backrefs would do it, except that the backref expands to a literal match against a preceding part of the string, not to “this, but only if that was matched”.)

Sometimes, it's possible to instead use a two step process, replacing a complex X with a single character known to not be present in the source text (control characters can be suitable for that) so allowing a simpler second-stage match. Not always possible though; depends on what you're matching.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
1

You can write something like (\[)?x(??{ defined $1 ? "]" : "" }) but you probably shouldn't.

hobbs
  • 223,387
  • 19
  • 210
  • 288
1

I tested the /(\[)?X(?(1)\])/ solution (which garnered a score of 7), and it also matched [X and X], which are incorrect. The original poster's /(?:$x|\[$x\]) bla/ actually works, requiring either matched brackets or none.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Tom Williams
  • 151
  • 1
  • 2