I'm trying to match file contents against anti-virus signatures with help of PHP regex, but I'm having problem with:
preg_match(): Compilation failed: regular expression is too large at offset 107
Patterns that fail typically looks like this:
75633d617313134(?:..){0,27615}75626f756e687228756328692929
I've tried various modifications with help of https://regex101.com/, but without success. I still get same error when I reduce the pattern to simply:
(?:.){0,4000}
Can someone explain why? From my readings on this forum the limit should be ~65000? And why is it working if I change the number of matches to {0,}?
My server is running Apache with PHP 7.2.7. PCRE library version is 8.42 (pcre.backtrack_limit: 1000000, pcre.recursion_limit: 100000).
The original patterns are coming from ClamAV's anti-virus database, which supposedly are designed for the regex.c library. To get them working with PHP/PCRE a conversion is needed, hence it is not possible to manually re-write each pattern. To re-compile PHP to increase PCRE LINK_SIZE is not an option due to shared web hosting.
Currently preg_replace is used with ~\{([0-9]+)-([0-9]+)\}~
, replacing the match with (?:..){\1,\2}
.
My original question was to understand how PCRE could come to conclusion that even the simplified statement above is too big. But ultimately the final target is to get the pattern changed/fixed to work for its intended purpose.
The post "Why am I being warned that my regular expression is too large?" is somewhat explaining parts of this but not fully finding the root cause/solution.