How can I safely use regexes from user input?

Question

My (Perl-based) application needs to let users input regular expressions, to match various strings behind the scenes. My plan so far has been to take the string and wrap it in something like

$regex = eval { qr/$text/ };
if (my $error = $@) { 
   # mangle $error to extract user-facing message

($text having been stripped of newlines ahead of time, since it's actually multiple regular expressions in a multi-line text-field that I split).

Are there any potential security risks with doing this - some weird input that could lead to arbitrary code execution? (Besides the buffer overflow vulnarabilities in the regular expression engines like CVE-2007-5116). If so, are there ways to mitigate them?

Is there a better way to do this? Any Perl modules which help abstract the operations of turning user input into regular expressions (such as extracting error messages ... or providing modifiers like /i, which I don't strictly need here, but would be nice)? I searched CPAN and didn't find much that was promising, but entertain the possibility that I missed something.

@Ether: How does tainting help here? It helps keep you from accidentally using untrusted input where it could cause a security problem. Here, we're looking for a way to safely use an untrusted regex. — cjm, Jan 29 '10 at 09:15
The answer by Mike Mestnik should be preferred as the accepted answer as it actually delivers a solution to the problem. The other answers are good, but mainly highlight why user-specified regexes are a problem. Even the NFA/DFA resource problems are addressed by [`re::engine::RE2`](http://search.cpan.org/~dgl/re-engine-RE2-0.13/lib/re/engine/RE2.pm). — sshine, Mar 05 '18 at 09:45
@SimonShine This question was asked in 2010, the oldest release of the package in question that is listed on CPAN is from 2011, Mike's answer was posted in 2015 and it is now 2018. I can respect the merits of updating information on the Internet, but ... is it really part of StackOverflow's social culture to change the award of an 'Accepted' answer in response to new developments like this eight years after the fact? Until I understand clearly that this is the case, I will leave the green check mark where it is and permit StackOverflow readers to glean this information from the comments. — , Mar 12 '18 at 21:46
@fennec: It most certainly should be the case, yes. The main purpose for reading a Q/A is because you want to solve a related problem today, not several years ago. It is true that the correct answer has changed over time. If you don't want to acknowledge that, you will leave several people stumbling through outdated solutions until they end up at the bottom answer, which happens to be the most helpful. I'm not blaming you for not keeping track of all answers to all questions you've asked, and I'm not saying all the other answers are useless. But the most practical answer today is Mike's. — sshine, Mar 13 '18 at 08:44
Some inspiration from meta: [It's entirely the prerogative of the OP to accept any answer they deem most suitable.](https://meta.stackoverflow.com/a/294696/235908) [There is nothing inherently wrong with changing the accepted answer.](https://meta.stackoverflow.com/questions/335277/change-accepted-answer-after-some-years) I've elaborated on this subject [in this meta discussion](https://meta.stackoverflow.com/a/364527/235908) for posterity. — sshine, Mar 13 '18 at 10:08

score 6 · Answer 1 · answered Jan 29 '10 at 03:20

Using untrusted input as a regular expression creates denial-of-service vulnerability as described in perlsec:

Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways. Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.

I can cope with exposing a DOS vulnerability. Goodness knows there are plenty in the rest of the application for people who can enter these regexpen. A magical 'wipe hard disk' button is another matter, though. :) — , Jan 29 '10 at 17:57

score 5 · Accepted Answer · answered Jan 29 '10 at 02:30

5

With the (?{ code }) construct, user input could be used to execute arbitrary code. See the example in perlre#code and where it says

local $cnt = $cnt + 1,

replace it with the expression

system("rm -rf /home/fennec"); print "Ha ha.\n";

(Actually, don't do that.)

answered Jan 29 '10 at 02:30

mob

117,087
18
149
283

5

Fortunately, `(?{ code })` causes a compile time error if the regex includes variable interpolation unless you say `use re 'eval'` (for exactly this reason). – cjm Jan 29 '10 at 06:07
1

@cjm - But it's not an error to say `$re=eval{qr/$tainted/}` and then to use that regex, as the OP has done (unless you use `taintperl`) – mob Jan 29 '10 at 14:48
Ah, with the help of your pointer I found this within the docs: "Before Perl knew how to execute interpolated code within a pattern, this operation was completely safe from a security point of view, although it could raise an exception from an illegal pattern." This is comforting. – Jan 29 '10 at 18:09
"*user input could be used to execute arbitrary code*" - I tried it, and it just throws an exception `Eval-group not allowed at runtime, use re 'eval' in regex m/.../`. – melpomene May 01 '19 at 10:16

score 5 · Answer 3 · edited Mar 05 '18 at 09:42

There is some discussion about this over at The Monastery.

TLDR: use re::engine::RE2 -strict => 1;

Make sure to add -strict => 1 to your use statement or re::engine::RE2 will fall back to Perl's re.

The following is a citation from Paul Wankadia (junyer), owner of the project on GitHub:

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

To sum up the important points:

It's safe from arbitrary code execution by default, but add "no re 'eval';" to prevent PERL5OPT or ??anything else?? from setting it on you. I'm not sure if doing so prevents everything.
Use a sub-process(fork) with BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout.

ghostdog74 · Answer 4 · 2010-01-29T02:13:18.760

3

the best way, is not to let users have too much privilege. Provide an interface just enough for users to do what they want. (like an ATM machine with only buttons for various options, no need for keyboard input). Of course, if you need user to key in input, then provide text box and then at the back end, use Perl to process the request (eg sanitizing etc). The motive behind letting your users input a regex is to search for string patterns right?? Then in that case, the most simplest and secure way is to tell them to input just the string. Then at the back end, you use Perl's regex to search for it. Is there any other compelling reason to have user input regex themselves?

edited Jan 29 '10 at 02:13

answered Jan 29 '10 at 02:00

ghostdog74

327,991
56
259
343

2

Presumably if they want to search for *patterns*, searching for plain strings is going to be orders of magnitude less powerful than being able to search by regexes. – Wooble Jan 29 '10 at 02:20
2

Yes. $customers demand more flexibility than a simple string match is capable of providing in this case. As for privileges, though, only moderately-trusted users get to do the regular expressions anyway. I just don't want to extend these users system("rm -rf /") capabilities and the like. – Jan 29 '10 at 18:02

score 1 · Answer 5 · answered Jan 29 '10 at 18:30

1

Perhaps you could use a different regex engine that does not have the dangerous code tag support.

I haven't tried it but there is a PCRE for perl. You may also be able to limit or remove code support using this info on creating custom regex engines.

answered Jan 29 '10 at 18:30

daotoad

26,689
7
59
100

How can I safely use regexes from user input?

5 Answers5

Linked