32

Is there any elegant solution to build a variable length look-behind regex such as this one ?

/(?<=eat_(apple|pear|orange)_)today|yesterday/g;

It seems Perl has a very impressive regex engine and variable length lookbehind would be very interesting. Is there a way to make it work or should I forget this bad idea ?

nowox
  • 25,978
  • 39
  • 143
  • 293
  • did you want to match only `today` or `yesterday` which comes just after to `eat_apple` or `eat_pear` or `eat_orange` – Avinash Raj Aug 29 '14 at 08:17
  • 1
    This is certainly possible today with `/((?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))today|yesterday/g` But it’s much less elegant than what we might want. The REAL limitation is matchers with *s and +s or other wide range of lengths. My understanding is that .NET’s implementation works around this by reversing the pattern and the string. – Loren Osborn Oct 02 '18 at 23:15

7 Answers7

27

Use \K as a special case.

It's a variable length positive lookbehind assertion:

/eat_(?:apple|pear|orange)_\Ktoday|yesterday/g

Alternatively, you can list out your lookbehind assertions separately:

/(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))today|yesterday/g

However, I would propose that it's going to be a rare problem that could potentially use that feature, but couldn't be rethought to use a combination of other more common regex features.

In other words, if you get stuck on a specific problem, feel free to share it here, and I'm sure someone can come up with a different (perhaps better) approach.

Miller
  • 34,962
  • 4
  • 39
  • 60
  • 14
    Nit: `\K` is not a variable-length lookbehind. It means "keep", and excludes whatever precedes it from being included in `$&` (the matched string). That said, `\K` can be used to emulate a variable-length positive lookbehind, so +1. – Michael Carman Aug 29 '14 at 13:22
11

How about:

(?:(?<=eat_apple_)|(?<=eat_pear_)|(?<=eat_orange_))(today|yesterday)

A little bit ugly, but it works.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Toto
  • 89,455
  • 62
  • 89
  • 125
  • 3
    +1 this is definitely a good workaround for a small finite list. – anubhava Aug 29 '14 at 08:07
  • +1 And `(?<=eat_apple_|eat_pear_|eat_orange_)(today|yesterday)` (without parenthesized subpatterns) would not work in perl? Still fixed length: [see example](http://regex101.com/r/sT2bH5/2) – Jonny 5 Aug 29 '14 at 08:58
  • 1
    @Jonny5: No, there're not equal in length. – Toto Aug 29 '14 at 09:03
4

Blog post found today, linked to me at #regex @ irc.freenode.org:

http://www.drregex.com/2019/02/variable-length-lookbehinds-actually.html

This article explains how to do a variable width look-behind in PCRE.

The solution would then be:

/(?=(?=(?'a'[\s\S]*))(?'b'eat_(?:apple|pear|orange)_(?=\k'a'\z)|(?<=(?=x^|(?&b))[\s\S])))today|yesterday/g

https://regex101.com/r/9DNpFj/1

Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
Doqnach
  • 372
  • 1
  • 8
3

You can use look-ahead instead of look-behind:

/(?:eat_(apple|pear|orange)_)(?=today|yesterday)/g

and in general, there is an alternative way to describe things that naively seem to require look-behind.

perreal
  • 94,503
  • 21
  • 155
  • 181
3

Perl v5.30 adds experimental variable-width lookbehinds in situations where the regex engine knows that the length will be 255 characters or less (so, no unbounded quantifiers, for example).

This now works:

use v5.30;
use experimental qw(vlb);

$_ = 'eat_apple_today';
say "Matched!" if /(?<=eat_(apple|pear|orange)_)today|yesterday/g;
brian d foy
  • 129,424
  • 31
  • 207
  • 592
2

Alternative solution - reverse the string and use lookahead instead. It may look ugly having to write the pattern words in reverse but it's an option when everything else fails.

Slava
  • 2,040
  • 15
  • 15
0

The solution that worked for me:
Temporarily make whatever is variable in length fixed in length.

In this case:
Change all your 'eat_apple's, 'eat_pear's and 'eat_orange's to something like eat_fruit, and then run the expression you were thinking of with an acceptable fixed length look-behind. Even though it takes two passes and some memory, I find the code way easier to read, and it might be faster than some of these other solutions.