Why implement a different regex engine (e.g. PCRE) as a pragma?

Question

I'm curious about the best practices for using a different regex engine in place of the default Perl one and why the modules I've seen are pragmas and not a more traditional OO/procedural interface. I was wondering why that is.

I've seen a handful modules for replacing the Perl regex engine with PCRE (re::engine::PCRE), TRE (re::engine::TRE), or RE2 (re::engine::RE2) in a given lexical context. I can't find any object oriented modules for creating/compiling regular expressions that use a different back end. I'm curious why someone would choose to implement this functionality as a pragma rather than as a more typical module. It seems like replacing the perl regex engine would be a lot harder (depending on the complexity of the API it exposes) than making an XS script that exposes the API that PCRE, TRE, and RE2 already provide.

What did the authors of those modules say when you asked them? — Calle Dybedahl, Jul 26 '15 at 09:47
It's because it's more natural in Perl to use `s/re/repl/` for instance than to call some module method. also, you'd have to use `q/re/` instead of regex literals. — Lucas Trzesniewski, Jul 26 '15 at 12:13
@CalleDybedahl I didn't ask them. I thought it would be rude to ask such a basic question directly to the package maintainers rather than a more general forum. — Greg Nisbet, Jul 27 '15 at 04:43
[MarpaX::Languages::M4](http://metacpan.org/pod/MarpaX::Languages::M4) is an example of an OO package that is using another regexp engine — Jean-Damien Durand, Jul 29 '15 at 04:58

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

I'm curious about...why the modules I've seen are pragmas and not a more traditional OO/procedural interface.

Probably because the Perl regex API, documented in perldoc perlreapi and available since 5.9.5, lets you take advantage of Perl's parser, which gives you a lot of cool features with little code.

If you use the API, you:

don't have to implement your own version of split and the substitution operator s///
don't have to write your own code to parse regex modifiers (msixpn are passed as flags to your implementation's callback functions)
can take advantage of optimizations like constant regexes being compiled only once (at compile time) and regexes containing interpolated variables being compiled only when the variables change
can use qr in your programs to quote regular expressions and easily interpolate them into other regexes
can easily set numbered and named capture variables, e.g. $1, $+{foo}
don't force users of your engine to rewrite all of their code to use your API; they can simply add a pragma

There are probably more that I've missed. The point is, you get a lot of free code and free functionality with the API. If you look at the implementation of re::engine::PCRE, for example, it's actually fairly short (< 400 lines of XS code).

Alternatives

If you're just looking for an easier way to implement your own regex engine, check out re::engine::Plugin, which lets you write your implementation in Perl instead of C/XS. Do note that there is a long list of caveats, including no support for split and s///.

Alternatively, instead of implementing a completely custom engine, you can extend the built-in engine by using overloaded constants as described in perldoc perlre. This only works in constant regexes; you have to explicitly convert variables before interpolating them into a regex.

Why implement a different regex engine (e.g. PCRE) as a pragma?

1 Answers1

Alternatives