0

I get first input from user which is a tree (having significant height and depth) of nodes. Each of the node contains a regex and modifiers. This tree gets saved in memory. This is taken only once at the application startup.

The second input is a value which is matched starting at the root node of the tree till an exact matching leaf node is found (Depth First Search). The match is determined as follows :

my $evalstr = <<EOEVAL;
if(\$input_value =~ /\$node_regex/$node_modifiers){
    1;
}else{
    -1;
}
EOEVAL

no strict 'refs';
my $return_value = eval "no strict;$evalstr";

The second input is provided continuously throughout the application's life time by a source.

problem: The above code works very well for some time (approx. 10 hours), but after continuous input for this time, the eval continuously starts failing and I get -1 in $return_value. All other features of the application work very fine including other comparison statements.If I restart the application, the matching again starts and gives proper results.

Observations: 1) I get deep recursion warning many times, but I read somewhere it is normal as stack size for me would be more than 100 many a times, considering the size of the input tree. 2) If I use simple logic for regex match without eval as above, I don't get any issue for any continuous run of the application.

if($input_value =~ /$node_regex/){
    $return_value=1;
}else{
    $return_value=-1;
}

but then I have to sacrifice dynamic modifiers, as per Dynamic Modifiers

Checks: 1) I checked $@ but it is empty. 2) Also printed the respective values of $input_value,$node_regex and $node_modifiers, they are correct and should have matched the value with regex at the failure point. 3) I checked for memory usage, but it's fairly constant over the time for the perl process. 4) Was using perl 5.8.8 then updated it to 5.12, but still face the same issue.

Question : What could be the cause of above issue? Why it fails after some time, but works well when the application is restarted?

Community
  • 1
  • 1
Sushant
  • 379
  • 3
  • 14
  • Where does the regex come from? It's a wild guess, but perhaps it's running into catastrophic backtracking which would make sense with the performace degrading over deeper and deeper searches through the tree. Can you post some of the regexes used? Some of the text they're supposed to match? – Tim Pietzcker Oct 04 '11 at 12:14
  • Why would you need the `no strict;` in the eval? Doesn't the code work without it? – Dallaylaen Oct 04 '11 at 12:15
  • As a side note, what if you pre-compile the regexes and store them in the tree instead of doing string evals all the time? See: On node initialisation (or modification, if the tree is not r/o): `my $cached_regex = eval "qr/\$node_regex/$node_modifiers";` And later in the main loop: `my $return_value = $input_value =~ /$cached_regex/ ? 1:-1;` – Dallaylaen Oct 04 '11 at 12:34
  • The regex failed to match was : ^(0|1)$ – Sushant Oct 04 '11 at 12:40
  • Tim, The regex failed to match was : ^(0|1)$ , most of the regex input were simple strings, ^[0-5]$ being the most complex :) – Sushant Oct 04 '11 at 12:49
  • Can you provide a little more code? My only advice at this point is to tighten up your variable scopes. Use '{}' blocks if you need to. – Leonardo Herrera Oct 04 '11 at 14:56
  • What modifiers are you actually using that they can't be supplied inside of the pattern? There's probably another way to do this that doesn't involve millions of string evals, and besides sidestepping this problem it will probably also be much faster. – hobbs Oct 04 '11 at 14:56
  • I need to support following modifiers: gismox, Can I support the modifier g inside the pattern itself? following fails for the modifier 'g' as g affects the way the regex is used rather than the regex itself: my $cached_regex = eval "qr/\$node_regex/$node_modifiers"; – Sushant Oct 05 '11 at 14:10

1 Answers1

0

A definitive answer would require more knowledge of perl internals than I have. But given what you are doing, continuous parsing of large trees, it seems safe to assume that some limit is being reached, some resource is exhausted. I would take a close look at things and make sure that all resources are being released between each iteration of a parse. I would be especially concerned with circular references in the complex structures, and making sure that there are none.

Bill Ruppert
  • 8,956
  • 7
  • 27
  • 44