9

I'm encountering an issue where preg_replace() with a complicated regular expression causes an error (PREG_BACKTRACK_LIMIT_ERROR) due to pcre.backtrack_limit being too low, which is set to 1,000,000 by default. I set this to 10,000,000, and it works for this particular application.

My question is, what exactly is backtracking limit's, loosely defined, "unit"? Does the 1,000,000 figure correspond to memory size? If not, what does it signify? I'm trying to understand what a reasonable setting for this on my environment.

Reference on preg_replace: http://us3.php.net/manual/en/pcre.configuration.php#ini.pcre.backtrack-limit

Reference on backtracking: In regular expressions, what is a backtracking / back referencing?

Community
  • 1
  • 1
laketuna
  • 3,832
  • 14
  • 59
  • 104
  • First of all, what you are asking here is not about a “unit” – that’s something like a mile or a kilogram, but there is no unit involved here, it’s just a plain number. And it sets the limit for backtracking attempts before giving up on a search that is becoming to complex/memory expensive. – CBroe May 05 '14 at 22:44
  • 2
    You've just answered my question. "Number of backtracking ATTEMPTS" is a unit :). If you could provide a link with a reference to this, I'd be happy to accept your answer. – laketuna May 05 '14 at 22:50
  • As an aside comment, If you have this kind of error, I suggest you to rewrite your pattern instead of changing the backtrack_limit. – Casimir et Hippolyte May 05 '14 at 23:28
  • @Casimir, I'm not sure if I can post it here, unfortunately. It's not mine. I was just more interested in what the PHP setting was about. – laketuna May 06 '14 at 00:05

3 Answers3

5

From the PCRE source code, this error is returned when "match()" is called more than 1,000,000 times recursively:

/* First check that we haven't called match() too many times, or that we
haven't exceeded the recursive call limit. */

if (md->match_call_count++ >= md->match_limit) RRETURN(PCRE_ERROR_MATCHLIMIT);

That is converted into a "PHP_PCRE_BACKTRACK_LIMIT_ERROR" error here.

According to the pcreapi manpage (see https://serverfault.com/a/408272/140833 ):

Internally, PCRE uses a function called match() which it calls repeatedly (sometimes recursively). The limit set by match_limit is imposed on the number of times this function is called during a match, which has the effect of limiting the amount of backtracking that can take place. For patterns that are not anchored, the count restarts from zero for each position in the subject string.

I think that the unit is therefore something like "Number of backtracking attempts". I'm not sure that it's 1-to-1 with that though.

Here's a demo isolating the error case with a simple "Catastrophic Backtracking" regex:

<?php

ini_set('pcre.backtrack_limit', 100);

for ($len = 1000; $len <= 1001; $len++) {

    $x = str_repeat("x", $len);
    $ret = preg_match("/x+x+y/", $x);

    echo "len = " . $len . "\n";
    echo "preg_match = " . $ret . "\n";
    echo "PREG_BACKTRACK_LIMIT_ERROR = " . PREG_BACKTRACK_LIMIT_ERROR . "\n";
    echo "preg_last_error = " . preg_last_error() . "\n";
    echo "\n";
}

Run this code here: https://3v4l.org/EpaNC, to get this output:

len = 1000
preg_match = 0
PREG_BACKTRACK_LIMIT_ERROR = 2
preg_last_error = 0

len = 1001
preg_match = 
PREG_BACKTRACK_LIMIT_ERROR = 2
preg_last_error = 2
Community
  • 1
  • 1
Rich
  • 15,048
  • 2
  • 66
  • 119
1

Don't know if this will help : According to pcre's source code this error code comes when pcre triggers an PCRE_ERROR_MATCHLIMIT. And according to this changelog of pcre, this is probably your fault because your regex is probably causing a memory leak.

I could suggest to review your regex as a best way to solve your problem, otherwise, if you insist to make it work, you can do (but i don't recommend) smoething like this : ini_set('pcre.backtrack_limit', PHP_INT_MAX);

[edit] i believe this setting is all about pcre's heavy processing capabilities, that's why i suggest to review you regex to try to make it lighter (split in into multiple regexes, add more iterations on your data, etc...)

  • "this is probably your fault because your regex is probably causing a memory leak." -- I think that changelog entry refers to a specific bug that has been addressed. It is possible to trigger this error with a correct regex that uses too much backtracking. – Rich Nov 04 '16 at 15:33
0

This ini_set("pcre.backtrack_limit", "5000000"); worked for me. I placed this at the beginning stage of my mpdf page and within 1:04 minutes my 276 pages document was generated.

Blaztix
  • 1,223
  • 1
  • 19
  • 28