6

I'm curious about the best practices for using a different regex engine in place of the default Perl one and why the modules I've seen are pragmas and not a more traditional OO/procedural interface. I was wondering why that is.

I've seen a handful modules for replacing the Perl regex engine with PCRE (re::engine::PCRE), TRE (re::engine::TRE), or RE2 (re::engine::RE2) in a given lexical context. I can't find any object oriented modules for creating/compiling regular expressions that use a different back end. I'm curious why someone would choose to implement this functionality as a pragma rather than as a more typical module. It seems like replacing the perl regex engine would be a lot harder (depending on the complexity of the API it exposes) than making an XS script that exposes the API that PCRE, TRE, and RE2 already provide.

Greg Nisbet
  • 6,710
  • 3
  • 25
  • 65
  • 4
    What did the authors of those modules say when you asked them? – Calle Dybedahl Jul 26 '15 at 09:47
  • It's because it's more natural in Perl to use `s/re/repl/` for instance than to call some module method. also, you'd have to use `q/re/` instead of regex literals. – Lucas Trzesniewski Jul 26 '15 at 12:13
  • @CalleDybedahl I didn't ask them. I thought it would be rude to ask such a basic question directly to the package maintainers rather than a more general forum. – Greg Nisbet Jul 27 '15 at 04:43
  • [MarpaX::Languages::M4](http://metacpan.org/pod/MarpaX::Languages::M4) is an example of an OO package that is using another regexp engine – Jean-Damien Durand Jul 29 '15 at 04:58

1 Answers1

5

I'm curious about...why the modules I've seen are pragmas and not a more traditional OO/procedural interface.

Probably because the Perl regex API, documented in perldoc perlreapi and available since 5.9.5, lets you take advantage of Perl's parser, which gives you a lot of cool features with little code.

If you use the API, you:

  • don't have to implement your own version of split and the substitution operator s///
  • don't have to write your own code to parse regex modifiers (msixpn are passed as flags to your implementation's callback functions)
  • can take advantage of optimizations like constant regexes being compiled only once (at compile time) and regexes containing interpolated variables being compiled only when the variables change
  • can use qr in your programs to quote regular expressions and easily interpolate them into other regexes
  • can easily set numbered and named capture variables, e.g. $1, $+{foo}
  • don't force users of your engine to rewrite all of their code to use your API; they can simply add a pragma

There are probably more that I've missed. The point is, you get a lot of free code and free functionality with the API. If you look at the implementation of re::engine::PCRE, for example, it's actually fairly short (< 400 lines of XS code).

Alternatives

If you're just looking for an easier way to implement your own regex engine, check out re::engine::Plugin, which lets you write your implementation in Perl instead of C/XS. Do note that there is a long list of caveats, including no support for split and s///.

Alternatively, instead of implementing a completely custom engine, you can extend the built-in engine by using overloaded constants as described in perldoc perlre. This only works in constant regexes; you have to explicitly convert variables before interpolating them into a regex.

Community
  • 1
  • 1
ThisSuitIsBlackNot
  • 23,492
  • 9
  • 63
  • 110