I'm curious about...why the modules I've seen are pragmas and not a more traditional OO/procedural interface.
Probably because the Perl regex API, documented in perldoc perlreapi
and available since 5.9.5, lets you take advantage of Perl's parser, which gives you a lot of cool features with little code.
If you use the API, you:
- don't have to implement your own version of
split
and the substitution operator s///
- don't have to write your own code to parse regex modifiers (
msixpn
are passed as flags to your implementation's callback functions)
- can take advantage of optimizations like constant regexes being compiled only once (at compile time) and regexes containing interpolated variables being compiled only when the variables change
- can use
qr
in your programs to quote regular expressions and easily interpolate them into other regexes
- can easily set numbered and named capture variables, e.g.
$1
, $+{foo}
- don't force users of your engine to rewrite all of their code to use your API; they can simply add a pragma
There are probably more that I've missed. The point is, you get a lot of free code and free functionality with the API. If you look at the implementation of re::engine::PCRE
, for example, it's actually fairly short (< 400 lines of XS code).
Alternatives
If you're just looking for an easier way to implement your own regex engine, check out re::engine::Plugin
, which lets you write your implementation in Perl instead of C/XS. Do note that there is a long list of caveats, including no support for split
and s///
.
Alternatively, instead of implementing a completely custom engine, you can extend the built-in engine by using overloaded constants as described in perldoc perlre
. This only works in constant regexes; you have to explicitly convert variables before interpolating them into a regex.