4

This answer explains that to validate an arbitrary regular expression, one simply uses eval:

while (<>) {
    eval "qr/$_/;"
    print $@ ? "Not a valid regex: $@\n" : "That regex looks valid\n";
}

However, this strikes me as very unsafe, for what I hope are obvious reasons. Someone could input, say:

foo/; system('rm -rf /'); qr/

or whatever devious scheme they can devise.

The natural way to prevent such things is to escape special characters, but if I escape too many characters, I severely limit the usefulness of the regex in the first place. A strong argument can be made, I believe, that at least []{}()/-,.*?^$! and white space characters ought to be permitted (and probably others), un-escaped, in a user regex interface, for the regexes to have minimal usefulness.

Is it possible to secure myself from regex injection, without limiting the usefulness of the regex language?

Community
  • 1
  • 1
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • Can't you just ensure that the delimiter character, `/`, is not left unescaped in the body of the regex? If the regex doesn't terminate early, it can't inject anything. Now, making sure that's really so, that'll require knowing perl regexes pretty well. – 15ee8f99-57ff-4f92-890c-b56153 Dec 03 '13 at 17:32
  • @EdPlunkett: That's much easier said than done. What if my input was: `/foo[$%/]/`; it's perfectly safe, but not easy to see that without doing some extensive regex parsing. (Although an argument could possibly be made that only escaped `/` should be permitted, even when they wouldn't otherwise be strictly required--and I can't think of an example where an un-escaped `\` would be required) – Jonathan Hall Dec 03 '13 at 17:33
  • @EdPlunkett unfortuantly for this case perl supports arbitary code in regexes through ?{} allowing a valid regex to execute perl code – user1937198 Dec 03 '13 at 17:34
  • @user1937198: It's easy enough to `no re 'eval'` (I think that's a sufficient safeguard against that particular vector... no?) – Jonathan Hall Dec 03 '13 at 17:34
  • @user1937198 - Forgot about that! Good ol' Perl. There's always another way to break it! – 15ee8f99-57ff-4f92-890c-b56153 Dec 03 '13 at 17:35
  • @Flimzy Actually, [\/] works fine for me, so let 'em escape if it if they want it. Anybody who can compose a useful regular expression in the first place is going to be able to understand why you're imposing the constraint. – 15ee8f99-57ff-4f92-890c-b56153 Dec 03 '13 at 17:41
  • 1
    `no re 'eval'` is even the default. – ikegami Dec 03 '13 at 17:41
  • This questions is similar to the duplicate. [How can I safely use regexes from user input?](http://stackoverflow.com/questions/2159355/how-can-i-safely-use-regexes-from-user-input) – Mike Mestnik May 20 '15 at 02:58

2 Answers2

9

The solution is simply to change

eval("qr/$_/")

to

eval("qr/\$_/")

This can be written more clearly as follows:

eval('qr/$_/')

But that's still not optimal. The following would be far better as it doesn't involve generating and compiling Perl code at run-time:

eval { qr/$_/ }

Note that neither solution protects you from denial of service attacks. It's quite easy to write a pattern that will take longer than the life of the universe to complete. To hand that situation, you could execute the regex match in a child for which CPU ulimit has been set.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • The answer to correctly untaint a regex might prevent the invocation of vastly impossible to complete subroutines, I'm just saying. – Mike Mestnik May 17 '15 at 22:20
  • @Mike Mestnik, I'd be happy to include a reference to an algorithm to detect long-running patterns of you know of one. – ikegami May 17 '15 at 22:27
  • @ikegmi See [Question@perlmonks](http://www.perlmonks.org/?node_id=1126914) specifically [Summery of Answers](http://www.perlmonks.org/?node_id=1126941). – Mike Mestnik May 19 '15 at 00:24
1

There is some discussion about this over at The Monastery.

TLDR: use re::engine::RE2 (-strict => 1);

Make sure at add (-strict => 1) to your use statement or re::engine::RE2 will fall back to perl's re.

The following is a quote from junyer, owner of the project on github.

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

Mike Mestnik
  • 313
  • 5
  • 14